vault backup: 2026-04-27 11:11:54

This commit is contained in:
Vadym Samoilenko 2026-04-27 11:11:54 +01:00
parent e128d9e58a
commit 41e0ee3ea1
13 changed files with 890 additions and 1 deletions

View file

@ -36,5 +36,10 @@ This 3-hop pattern works for hundreds of articles without vector search.
| [[wiki/reports/_index\|reports/]] | Weekly and monthly knowledge base summaries | 0 |
| [[wiki/infrastructure/_index\|infrastructure/]] | Server inventory: all 10 SSH hosts — optical, optical-dev, baic, librechat, modocmms, box-cli, aimpress, pve | 9 |
| [[wiki/architecture/ai-cost-tracker\|architecture: ai-cost-tracker]] | Shared AI cost tracking service — architecture, Workspace→Team→Project, preflight+record SDK, LiteLLM pricing | 1 |
| [[wiki/tech-patterns/cost-tracker-integration\|tech-patterns: cost-tracker-integration]] | Integration playbook for any Oliver project connecting to ai-cost-tracker (9-step guide + troubleshooting) | 1 |
| [[wiki/tech-patterns/cost-tracker-pricing-sources\|tech-patterns: cost-tracker-pricing-sources]] | Three-layer pricing pipeline: LiteLLM > YAML > override; historical effective_from/to | 1 |
| [[wiki/tech-patterns/cost-tracker-providers\|tech-patterns: cost-tracker-providers]] | Billing units per provider: Gemini usage_metadata, ElevenLabs/GCP TTS len(text), future OpenAI/Anthropic | 1 |
<!-- New topic folders added here automatically as they are created -->
<!-- Format: | [[wiki/topic/_index\|topic/]] | One-line description | N articles | -->

View file

@ -29,4 +29,5 @@ Cross-cutting architectural decisions that appear in multiple Oliver projects.
5. **DEV_AUTH_BYPASS** — skip Azure AD in local dev, always use real auth in production
| [[wiki/architecture/optical-dev-server-deploy\|optical-dev-server-deploy]] | optical-dev GCP server: single-vhost Apache, Include pattern, port table, deploy script cache | Barclays Banner Builder, all Oliver projects |
| [[wiki/architecture/optical-dev-server-deploy\|optical-dev-server-deploy]] | optical-dev GCP server: single-vhost Apache, Include pattern, port table, deploy script cache | Barclays Banner Builder, all Oliver projects |
| [[wiki/architecture/ai-cost-tracker\|ai-cost-tracker]] | Shared AI cost tracking service: Workspace→Team→Project, LiteLLM pricing, preflight+record SDK, hard limits | All Oliver projects |

View file

@ -0,0 +1,78 @@
---
title: AI Cost Tracker — Architecture
tags: [architecture, ai, cost-tracking, oliver-platform]
created: 2026-04-27
updated: 2026-04-27
---
# AI Cost Tracker
Centralised **shared service** that tracks AI API spend across all Oliver projects. Every project that calls Gemini, ElevenLabs, Google Cloud TTS, or other paid AI APIs sends usage events here.
## Why it exists
Oliver runs multiple projects (video-accessibility, One2Edit, Box-pipelines …) all consuming paid AI APIs. Without centralised tracking: no visibility into total spend, no per-client cost attribution, no budget enforcement.
## Architecture diagram
```
┌──────────────────────────────────┐ ┌────────────────────────────────────┐
│ Any Oliver Project │ │ ai-cost-tracker │
│ (e.g. video-accessibility) │ POST │ git@bitbucket.org:zlalani/... │
│ │ /v1/ │ │
│ AI call sites │ preflt │ FastAPI + MongoDB + Redis + React │
│ (Gemini, ElevenLabs, GCP TTS) │────────►│ POST /v1/preflight │
│ │ │ │ POST /v1/usage/record │
│ oliver_cost_tracker SDK │ │ POST /v1/users/upsert │
│ - preflight(estimate) │◄────────│ POST /v1/projects/upsert │
│ - record(actual) │ │ │
│ - SQLite outbox + retry │ │ Admin UI (Microsoft SSO) │
└──────────────────────────────────┘ │ Workspaces / Teams / Projects │
│ Pricing + LiteLLM auto-sync │
│ Budgets + email alerts │
│ Dashboard + Pivot analytics │
└────────────────────────────────────┘
```
## Key design decisions
| Decision | Choice | Why |
|---|---|---|
| Deployment | Separate repo + separate server | Clean isolation, independent scaling |
| Org hierarchy | Workspace → Team → Project | Matches Oliver agency structure |
| User ownership | Each project owns users; lazy mirror in shared | No SSO migration needed |
| Pricing | LiteLLM auto-sync + YAML (non-LLM) + admin override | Auto-updated for LLMs, manual for chars |
| Transport | Sync HTTP + SQLite outbox fallback | Never breaks the AI pipeline |
| Budget enforcement | Hard limits via preflight check | `allow=false` before call is made |
| Auth (projects) | API key per project | Simple, revocable, auditable |
| Auth (admins) | Microsoft SSO | Consistent with all Oliver projects |
## Org hierarchy
```
Workspace (e.g. "Ford", "H&M", "Oliver Internal")
└── Team (e.g. "Video", "Localization", "QC")
└── Project (e.g. "Ford Q3 Campaign", "H&M Spring 2026")
└── Job / event (individual AI call)
```
Users live in each project (video-accessibility, etc.) and are **lazily mirrored** into the cost-tracker when their first usage event arrives.
## Tech stack
Mirrors video-accessibility for team familiarity:
- Backend: **FastAPI + MongoDB Atlas + Redis + Celery**
- Frontend: **React 18 + Vite (TypeScript)**
- Auth admin: **Microsoft AAD (MSAL)**
- Charts: **recharts**
- Tables/pivot: **@tanstack/react-table**
## Related articles
- [[wiki/tech-patterns/cost-tracker-integration|cost-tracker-integration]] — how to connect a new project
- [[wiki/tech-patterns/cost-tracker-pricing-sources|cost-tracker-pricing-sources]] — how pricing is maintained
- [[wiki/tech-patterns/cost-tracker-providers|cost-tracker-providers]] — billing units per AI provider
- [[wiki/concepts/preflight-record-pattern|preflight-record-pattern]] — the core usage-tracking pattern
- [[wiki/concepts/lazy-user-mirror|lazy-user-mirror]] — how user sync works
- [[wiki/concepts/sync-with-outbox|sync-with-outbox]] — resilient HTTP calls with SQLite fallback
- [[wiki/projects-overview/ai-cost-tracker|ai-cost-tracker (project card)]] — project registry card

View file

@ -50,5 +50,10 @@
| [[wiki/concepts/fish-fisher-conf-d-conflict]] | Fisher plugin manager conflict when conf.d/ files are manually copied — delete and run fisher update to reinstall cleanly | daily/2026-04-22.md | 2026-04-22 |
| [[wiki/concepts/macos-python-version-hooks]] | macOS system /usr/bin/python3 = 3.9; Path\|None syntax requires 3.10+; Claude Code hooks must use /opt/homebrew/bin/python3 | daily/2026-04-22.md | 2026-04-22 |
| [[wiki/concepts/preflight-record-pattern]] | preflight(estimate) → AI call → record(actual) — the 3-step AI cost-tracking pattern with budget enforcement | ai-cost-tracker | 2026-04-27 |
| [[wiki/concepts/lazy-user-mirror]] | User mirror created on first AI event, not on user creation — minimal integration surface, source project stays owner | ai-cost-tracker | 2026-04-27 |
| [[wiki/concepts/sync-with-outbox]] | Sync HTTP + SQLite outbox: record() never blocks the AI pipeline; background flusher retries up to 10x | ai-cost-tracker | 2026-04-27 |
| [[wiki/concepts/litellm-pricing-source]] | LiteLLM model_prices JSON as auto-updating LLM price source — why scraping provider sites is fragile | ai-cost-tracker | 2026-04-27 |
<!-- Articles added automatically by compile.py -->
<!-- Format: | [[concepts/slug]] | One-line summary | daily/YYYY-MM-DD.md | date | -->

View file

@ -0,0 +1,58 @@
---
title: Lazy User Mirror
tags: [concept, architecture, cost-tracking, users]
created: 2026-04-27
updated: 2026-04-27
---
# Lazy User Mirror
The cost-tracker maintains a read-only mirror of users from each connected project. "Lazy" means the mirror record is created **on first use**, not when the user is created in the source project.
## Why lazy (not eager)?
- Source projects own their users; we don't want to add mandatory webhooks to every user CRUD operation
- A user who never triggers an AI call doesn't need to exist in the mirror
- Keeps integration surface minimal — just send `user_external_id` with every event
## How it works
1. When `POST /v1/preflight` or `POST /v1/usage/record` arrives with a `user_external_id` not yet in `users_mirror` → cost-tracker **auto-creates** a mirror record using `email`, `full_name`, and `role` from the request payload.
2. The SDK sends these on every call:
```python
await ct.preflight(
user_external_id=user.id,
user_email=user.email, # for mirror creation
user_full_name=user.full_name, # for analytics display
user_role=user.role, # for RBAC in analytics
...
)
```
3. `users_mirror` record fields: `source_app`, `external_id`, `email`, `full_name`, `role`, `workspace_id` (from project lookup), `team_id`, `first_seen_at`, `last_seen_at`.
4. Subsequent calls with the same `user_external_id` update `last_seen_at` only — no overwrite of profile fields. Profile changes are pushed via explicit `POST /v1/users/upsert`.
## Explicit sync for profile changes
If a user's email, role, or name changes in the source project:
```python
await ct.upsert_user(
external_id=user.id,
email=user.email,
full_name=user.full_name,
role=user.role,
project_external_id=...,
)
```
In video-accessibility this is called from:
- `POST /admin/users` — on user creation (`routes_admin.py:101`)
- Login flow — on every successful login (keeps mirror fresh)
## Analytics implications
- Users without any AI events never appear in the mirror → analytics shows only users who actually consumed AI
- Orphan users (no `project_external_id` provided) are mirrored but show `workspace_id=null` until assigned via bulk action in admin UI

View file

@ -0,0 +1,59 @@
---
title: LiteLLM as Pricing Source
tags: [concept, ai, cost-tracking, pricing, llm]
created: 2026-04-27
updated: 2026-04-27
---
# LiteLLM as Pricing Source
The AI Cost Tracker uses the open-source **LiteLLM model prices JSON** as the primary source of truth for LLM pricing. This eliminates the need to scrape provider websites.
## What is LiteLLM?
[LiteLLM](https://github.com/BerriAI/litellm) is an open-source Python library (30k+ GitHub stars) for calling 100+ LLM providers with a unified interface. It maintains a community-curated `model_prices_and_context_window.json` covering Gemini, OpenAI, Anthropic, Cohere, Mistral, Together AI, and many others.
## Why not scrape provider websites directly?
| Problem | Impact |
|---|---|
| Pricing pages are React SPAs | Need headless browser; brittle |
| Layout changes without notice | Breaks silently; wrong costs logged |
| Different billing units per provider | Complex parsing; easy to get wrong |
| Tier/volume discounts in HTML | Nearly impossible to parse reliably |
| ToS may prohibit scraping | Legal risk |
LiteLLM maintains all of this in a single structured JSON — battle-tested by thousands of production deployments.
## The JSON structure
```json
{
"gemini/gemini-3-pro-preview": {
"input_cost_per_token": 0.00000125,
"output_cost_per_token": 0.000005,
"litellm_provider": "google",
"mode": "chat",
"max_tokens": 65536
}
}
```
## What LiteLLM does NOT cover
- **ElevenLabs** — not an LLM; character-based billing
- **Google Cloud TTS** — not an LLM; character-based billing
- **Self-hosted models** — no external billing
These are defined in `pricing/models.yaml` in the cost-tracker repo. See [[wiki/tech-patterns/cost-tracker-pricing-sources|cost-tracker-pricing-sources]].
## Keeping prices accurate
1. Daily Celery beat task (`tasks/pricing_sync.py`) fetches the latest JSON
2. If a price changes → admin gets notified; new price record created with `effective_from=today`
3. Old price records kept forever for historical reporting
4. To freeze at a known-good version: set `LITELLM_COMMIT_HASH` env var
## The alternative considered
Direct website scraping was evaluated and rejected due to the problems listed above. LiteLLM is the standard community solution for this exact use case.

View file

@ -0,0 +1,59 @@
---
title: Preflight + Record Pattern
tags: [concept, ai, cost-tracking, patterns]
created: 2026-04-27
updated: 2026-04-27
---
# Preflight + Record Pattern
The core usage-tracking pattern used by the AI Cost Tracker SDK. Every paid AI call follows the same three steps.
## The pattern
```
preflight(estimated_units) → call AI → record(actual_units)
```
1. **Preflight** — before the AI call, ask the cost-tracker: "Is this workspace/project within budget?"
- Input: model name + estimated units (tokens, chars, etc.)
- Output: `allow=true/false`, estimated cost, `request_id`
- If `allow=false` → raise `BudgetExceeded` before calling the AI API
2. **AI call** — the actual paid API call (unmodified)
3. **Record** — after the call, report actual usage
- Input: `request_id` from preflight + actual units from response
- Output: `event_id`, `cost_usd`
- If cost-tracker is unavailable → SDK saves to SQLite outbox and retries in background (see [[wiki/concepts/sync-with-outbox|sync-with-outbox]])
## Why two steps?
- **Preflight** enables hard budget enforcement **before** money is spent
- **Record** captures accurate actual usage (estimated ≠ actual for output tokens)
- Decoupling protects the AI pipeline: if cost-tracker goes down after preflight, `record()` still succeeds via outbox
## Estimation accuracy
Preflight uses estimated units because output token count is unknown before the call:
| Provider | What we estimate | Accuracy |
|---|---|---|
| Gemini text | input tokens (`len/4`), output tokens (caller hint) | ±30% |
| Gemini video | input tokens (file-size table), output tokens (hint) | ±50% |
| ElevenLabs | chars (exact — `len(text)`) | 100% |
| Google TTS | chars (exact — `len(text)`) | 100% |
Over-estimation is better than under-estimation for budget enforcement. If you consistently over-estimate by 50%, tune the default `estimated_output_tokens` hint downward.
## Hard limit mechanics
- Preflight computes `current_month_spend + estimated_cost`
- If this exceeds `budget.amount_usd` AND `budget.hard_limit=True``allow=false`
- The budget check is **eventual** (reads from pre-aggregated rollups + today's raw events), not transactional — brief overage is possible under high concurrency
- This is acceptable for AI cost tracking: exact-to-the-cent enforcement would require distributed locks and add unacceptable latency
## Projects using this pattern
- [[wiki/projects-overview/ai-cost-tracker|ai-cost-tracker]] (defines the pattern)
- video-accessibility (first consumer, Phase 1)

View file

@ -0,0 +1,67 @@
---
title: Sync HTTP + SQLite Outbox Pattern
tags: [concept, architecture, resilience, cost-tracking]
created: 2026-04-27
updated: 2026-04-27
---
# Sync HTTP + SQLite Outbox Pattern
The pattern used by the cost-tracker SDK to ensure usage events are **never lost** even if the cost-tracker service is temporarily unavailable.
## The problem
The AI pipeline (Celery worker) calls `ct.record(...)` after each AI API call. If the cost-tracker service is down, a naive implementation would either:
- Silently drop the event (cost data lost)
- Raise an exception (AI pipeline fails — unacceptable)
## The solution
```
record() → try POST /v1/usage/record
├── success → done
└── failure (timeout / 5xx / network) → save to SQLite outbox
background flusher (every 30s)
retries all pending events with
exponential backoff
```
## Implementation details
**SQLite outbox** (one file per worker, default `/tmp/cost_outbox.sqlite`):
- Schema: `(id, ts, payload_json, attempts, last_attempt_at, status)`
- Written synchronously before returning from `record()` on failure
- Never blocks the AI pipeline
**Background flusher** (asyncio background task):
- Starts when `CostTracker` is initialised
- Every 30 seconds: reads all `status='pending'` rows, retries `POST /v1/usage/record`
- On success: marks `status='sent'`
- After 10 failed attempts: marks `status='dead'`, logs warning → human investigation needed
**Graceful degradation:**
- `record()` never raises `CostTrackerUnavailable` — it's fire-and-forget via outbox
- `preflight()` returns `allow=true` on connectivity failure by default (`fail_open=True`). Configurable.
## Configuration
```python
ct = CostTracker(
...
outbox_path="/tmp/cost_outbox.sqlite",
flush_interval_seconds=30,
max_retry_attempts=10,
fail_open=True, # preflight returns allow=True when service unreachable
)
```
## Monitoring
- Outbox depth reported in SDK's `/metrics` endpoint (if enabled)
- `dead` status rows require manual review — add to monitoring alert
## Where this pattern applies
- Any Oliver project using the `oliver-cost-tracker` SDK
- Generally applicable to any fire-and-forget side-effect call where data loss is unacceptable but the consumer must not block the main flow

View file

@ -0,0 +1,50 @@
---
title: AI Cost Tracker — Project Card
tags: [projects, oliver-platform, ai, cost-tracking]
created: 2026-04-27
updated: 2026-04-27
---
# AI Cost Tracker
| Field | Value |
|---|---|
| **Repo** | `git@bitbucket.org:zlalani/ai-cost-tracker.git` |
| **Stack** | FastAPI + MongoDB Atlas + Redis + Celery + React 18 + Vite |
| **Auth (admin)** | Microsoft AAD (MSAL) |
| **Auth (API)** | API key per connected project |
| **Status** | Phase 1 — building (April 2026) |
| **First consumer** | video-accessibility |
## What it does
Centralised AI cost tracking for all Oliver projects. Every project sends preflight + record events; cost-tracker aggregates, stores, and presents analytics by workspace / team / project / user / model.
See [[wiki/architecture/ai-cost-tracker|ai-cost-tracker architecture]] for full architecture details.
## Key URLs
- Admin UI: TBD (e.g. `https://cost.oliver.agency`)
- API base: `https://cost.oliver.agency/v1`
- Health: `https://cost.oliver.agency/v1/health`
## Connected projects
| Project | Source app name | Connected since |
|---|---|---|
| video-accessibility | `video-accessibility` | Phase 1 (April 2026) |
## How to connect a new project
See [[wiki/tech-patterns/cost-tracker-integration|cost-tracker-integration]] — full step-by-step guide.
## Related articles
- [[wiki/architecture/ai-cost-tracker|Architecture overview]]
- [[wiki/tech-patterns/cost-tracker-integration|Integration guide (new projects)]]
- [[wiki/tech-patterns/cost-tracker-pricing-sources|Pricing sources]]
- [[wiki/tech-patterns/cost-tracker-providers|Provider billing units]]
- [[wiki/concepts/preflight-record-pattern|Preflight + Record pattern]]
- [[wiki/concepts/lazy-user-mirror|Lazy user mirror]]
- [[wiki/concepts/sync-with-outbox|Sync HTTP + outbox pattern]]
- [[wiki/concepts/litellm-pricing-source|LiteLLM as pricing source]]

View file

@ -24,6 +24,9 @@ Recurring technology stacks used across Oliver Agency projects. Each article cov
| [[wiki/tech-patterns/one2edit-api\|one2edit-api]] | One2Edit translation platform API | 3M Portal, H&M O2E Tool |
| [[wiki/tech-patterns/nodejs-vanilla-proxy\|nodejs-vanilla-proxy]] | Node.js + Vanilla JS lightweight proxy tools | 3M Portal, Ferrero, Homepage |
| [[wiki/tech-patterns/kling-veo-video-api\|kling-veo-video-api]] | Kling AI + Google Veo 3.1 video generation — camera control, I2V, polling | Cinema Studio Pro |
| [[wiki/tech-patterns/cost-tracker-integration\|cost-tracker-integration]] | Step-by-step guide: connect any Oliver project to ai-cost-tracker (API key, SDK install, wrap AI calls, budgets) | All Oliver projects |
| [[wiki/tech-patterns/cost-tracker-pricing-sources\|cost-tracker-pricing-sources]] | Three-layer pricing pipeline: LiteLLM auto-sync > YAML (non-LLM) > admin override; historical effective_from/to | ai-cost-tracker |
| [[wiki/tech-patterns/cost-tracker-providers\|cost-tracker-providers]] | Billing units per AI provider: Gemini tokens (usage_metadata), ElevenLabs chars, Google TTS chars | All AI projects |
## Quick Decision Guide

View file

@ -0,0 +1,232 @@
---
title: AI Cost Tracker — Integrating a New Project
tags: [how-to, ai, cost-tracking, integration]
created: 2026-04-27
updated: 2026-04-27
---
# Integrating a New Project with AI Cost Tracker
Step-by-step guide for connecting any Oliver backend project to the shared cost-tracker.
## Prerequisites
- You have a Bitbucket account with access to `zlalani/ai-cost-tracker`
- You can reach the cost-tracker admin UI (ask for the domain)
- Your project is a FastAPI + Python backend (adaptable for other stacks)
---
## Step 1 — Get an API key
1. Open the cost-tracker Admin UI → **API Keys** → **Create key**
2. Name it after your project (e.g. `video-accessibility-prod`)
3. Scope: `preflight`, `record`, `upsert`
4. Copy the key — it is shown **only once**
5. Store in your project's GCP Secret Manager (same pattern as `GEMINI_API_KEY`)
---
## Step 2 — Install the SDK
```bash
pip install oliver-cost-tracker
# or, while the private package isn't published yet:
# git submodule add git@bitbucket.org:zlalani/ai-cost-tracker.git vendor/cost-tracker
# pip install -e vendor/cost-tracker/sdk/
```
---
## Step 3 — Add environment variables
```env
COST_TRACKER_BASE_URL=https://cost.oliver.agency
COST_TRACKER_API_KEY=ct_live_xxxxxxxxxxxxxxxxxxxx
COST_TRACKER_SOURCE_APP=video-accessibility
COST_TRACKER_OUTBOX_PATH=/tmp/cost_outbox.sqlite
```
---
## Step 4 — Initialise the client
In `core/dependencies.py` (FastAPI):
```python
from oliver_cost_tracker import CostTracker
from functools import lru_cache
@lru_cache
def get_cost_tracker() -> CostTracker:
return CostTracker(
base_url=settings.cost_tracker_base_url,
api_key=settings.cost_tracker_api_key,
source_app=settings.cost_tracker_source_app,
outbox_path=settings.cost_tracker_outbox_path,
)
```
In Celery workers, instantiate once at module level:
```python
cost_tracker = CostTracker(
base_url=settings.cost_tracker_base_url,
api_key=settings.cost_tracker_api_key,
source_app="video-accessibility",
outbox_path="/tmp/cost_outbox.sqlite",
)
```
---
## Step 5 — Wrap AI calls (preflight → call → record)
This is the **core pattern**. Every paid AI call follows three steps.
See [[wiki/concepts/preflight-record-pattern|preflight-record-pattern]] for the full explanation.
```python
from oliver_cost_tracker import CostTracker, BudgetExceeded
import time
async def call_gemini_with_tracking(
ct: CostTracker,
prompt: str,
user_id: str,
job_id: str,
project_id: str | None = None,
) -> GenerateContentResponse:
# 1. Preflight — checks budget, returns allow/deny
preflight = await ct.preflight(
user_external_id=user_id,
project_external_id=project_id,
job_external_id=job_id,
model="gemini-3-pro-preview",
estimated_input_tokens=ct.estimate_tokens(prompt),
estimated_output_tokens=2048, # conservative overestimate
)
if not preflight.allow:
raise BudgetExceeded(preflight.deny_reason)
# 2. AI call
t0 = time.monotonic()
response = await client.models.generate_content(
model="gemini-3-pro-preview",
contents=prompt,
)
elapsed_ms = int((time.monotonic() - t0) * 1000)
# 3. Record actual usage
await ct.record(
request_id=preflight.request_id,
user_external_id=user_id,
project_external_id=project_id,
job_external_id=job_id,
model="gemini-3-pro-preview",
input_tokens=response.usage_metadata.prompt_token_count,
output_tokens=response.usage_metadata.candidates_token_count,
latency_ms=elapsed_ms,
status="success",
)
return response
```
For providers without usage metadata (ElevenLabs, Google Cloud TTS), compute units from input:
```python
await ct.record(
...
model="eleven_multilingual_v2",
chars=len(text), # billing unit is characters
latency_ms=elapsed_ms,
status="success",
)
```
See [[wiki/tech-patterns/cost-tracker-providers|cost-tracker-providers]] for all provider details.
---
## Step 6 — Attribution in background tasks (Celery)
Celery tasks typically receive only `job_id`. Fetch `user_id` and `project_id` from the job document:
```python
@celery_app.task
async def my_ai_task(job_id: str):
job = await db.jobs.find_one({"_id": job_id}, {"client_id": 1, "project_id": 1})
user_id = job["client_id"]
project_id = job.get("project_id") # may be None for old jobs
result = await call_gemini_with_tracking(
ct=cost_tracker,
prompt=build_prompt(job),
user_id=user_id,
job_id=job_id,
project_id=project_id,
)
```
For per-cue Celery tasks (like `tts_synthesis.py`), add `user_id` and `project_id` to the task kwargs to avoid an extra DB fetch per cue.
---
## Step 7 — Create Workspace / Team / Project in Admin UI
Before going live, set up the org structure in the cost-tracker Admin UI:
1. **Workspace** → Create (e.g. "Ford", "H&M", or "video-accessibility" for internal use)
2. **Team** → Create under the workspace (e.g. "Video Production")
3. **Project** → Create under the team, set `source_app` = your project name and `external_id` = the ID your project will send in `project_external_id`
> Jobs sent before a project is created appear as **Unassigned** in the dashboard. You can reassign them bulk later.
---
## Step 8 — Set budgets and alerts
In Admin UI → **Budgets** → Create:
- `scope_type`: workspace / team / project
- `amount_usd`: monthly limit
- `alert_thresholds`: [0.5, 0.8, 1.0] (email at 50%, 80%, 100%)
- `hard_limit`: true = preflight returns `allow=false` when exceeded
---
## Step 9 — Smoke test
```bash
# 1. Check service is up
curl https://cost.oliver.agency/v1/health
# 2. Preflight
curl -X POST https://cost.oliver.agency/v1/preflight \
-H "X-API-Key: ct_live_xxx" \
-H "Content-Type: application/json" \
-d '{"user_external_id":"test-user","model":"gemini-3-pro-preview","estimated_units":{"input_tokens":1000,"output_tokens":200}}'
# expect: {"allow":true,"estimated_cost_usd":...,"request_id":"..."}
# 3. Record
curl -X POST https://cost.oliver.agency/v1/usage/record \
-H "X-API-Key: ct_live_xxx" \
-H "Content-Type: application/json" \
-d '{"request_id":"<from above>","user_external_id":"test-user","model":"gemini-3-pro-preview","units":{"input_tokens":987,"output_tokens":180},"latency_ms":1200,"status":"success"}'
# expect: {"event_id":"...","cost_usd":0.00214}
```
Then open the Admin UI Dashboard — the test event should appear within seconds.
---
## Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| `preflight` returns 401 | Wrong or missing API key | Check `COST_TRACKER_API_KEY` env var; verify key is active in Admin UI |
| `preflight` returns `allow=false` | Budget exceeded | Admin UI → Budgets → raise limit or wait for next billing period |
| Events not appearing in dashboard | Outbox accumulating (service down) | Check `/tmp/cost_outbox.sqlite`; service auto-retries when back up |
| `cost_usd=null` on events | Model not in pricing table | Admin UI → Pricing → add model, or check LiteLLM sync task ran |
| Slow preflight (>500ms) | cost-tracker under load or network | SDK retries automatically; if persistent, check service metrics |
| Token estimates wildly off | Char/4 heuristic for video prompts | Gemini video needs file_size-based lookup; see [[wiki/tech-patterns/cost-tracker-providers|cost-tracker-providers]] |

View file

@ -0,0 +1,131 @@
---
title: AI Cost Tracker — Pricing Sources
tags: [how-to, ai, cost-tracking, pricing]
created: 2026-04-27
updated: 2026-04-27
---
# AI Cost Tracker — Pricing Sources
The cost-tracker uses a **three-layer hybrid pricing pipeline**. Understanding the priority order is essential for accurate billing attribution.
## Priority order
```
override (highest — set manually by admin)
yaml (fallback — versioned in repo for non-LLM providers)
litellm (lowest — auto-synced daily from open source)
```
`compute_cost(provider, model, units, ts)` returns the cost using the **highest-priority active price** for the given model at timestamp `ts`.
---
## Layer 1 — LiteLLM auto-sync (LLM providers)
**Source:** `https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json`
**Coverage:** Gemini, OpenAI, Anthropic, Cohere, Mistral, Together, and 100+ others.
**Sync schedule:** Celery beat task `tasks/pricing_sync.py` runs **daily at 02:00 UTC**.
**What happens on sync:**
1. Fetches the JSON (pinned to a configurable commit hash in `LITELLM_COMMIT_HASH` env var)
2. Maps `input_cost_per_token` / `output_cost_per_token` to our schema
3. For each model:
- If no existing price → creates new `model_prices` record with `source="litellm"`
- If price unchanged → updates `litellm_commit_hash`, no other change
- If price **changed** → closes old record (`effective_to=today`), creates new record, sends **admin notification email**
> **Note:** Auto-price changes never silently modify `source="override"` records. If you have an override active, the sync logs a divergence warning but leaves your override intact.
**To pin a specific version** (for reproducibility):
```env
LITELLM_COMMIT_HASH=abc123def456 # pin to a known-good commit
```
See [[wiki/concepts/litellm-pricing-source|litellm-pricing-source]] for deeper explanation.
---
## Layer 2 — YAML (non-LLM providers)
**File:** `backend/app/pricing/models.yaml` — versioned in the cost-tracker repo.
Contains providers that LiteLLM does not cover:
```yaml
# ElevenLabs
- provider: elevenlabs
model: eleven_multilingual_v2
billing_unit: char
price_per_1k_usd: 0.30
effective_from: "2025-01-01"
- provider: elevenlabs
model: eleven_flash_v2_5
billing_unit: char
price_per_1k_usd: 0.11
effective_from: "2025-01-01"
# Google Cloud TTS
- provider: google_tts
model: standard
billing_unit: char
price_per_1m_usd: 4.00
effective_from: "2024-01-01"
- provider: google_tts
model: wavenet
billing_unit: char
price_per_1m_usd: 16.00
effective_from: "2024-01-01"
```
**When to update YAML:**
- ElevenLabs raises/lowers per-char pricing
- Google Cloud TTS changes tier pricing
- Adding a brand-new non-LLM provider
**How to update:**
1. Add a new entry with the new price and `effective_from: "YYYY-MM-DD"`
2. Leave the old entry — it is used for historical cost attribution
3. Deploy the new YAML → loader upserts on startup
**Do NOT delete old entries** — they are needed for retroactive reports.
---
## Layer 3 — Admin override (UI)
Use when you have:
- A negotiated enterprise contract price (different from public pricing)
- A volume discount or committed-use agreement
- A temporary promotional rate
- A price correction before the next LiteLLM sync
**How to create an override:**
1. Admin UI → **Pricing** → find the model → **Override price**
2. Set: `price_per_unit_usd`, `effective_from` (defaults to today), optional `override_reason`
3. Save → old price gets `effective_to=effective_from`, override is now active
Override records are never auto-modified by LiteLLM sync.
---
## Historical pricing and retroactive reports
Every usage event is stored with `price_id` — a reference to the exact `model_prices` record active at the time of the call:
- **Retroactive reports are always accurate** — changing a price today does not affect yesterday's costs
- Old `model_prices` records with `effective_to` set are never deleted
- Re-evaluating historical costs with new pricing = manual export + spreadsheet (not a built-in feature)
---
## Monthly reconciliation
Recommended monthly check:
1. Download invoice from Google Cloud Console / ElevenLabs dashboard
2. Compare with cost-tracker "Actual vs Billed" report (Admin UI → Analytics → Reconciliation)
3. If >5% discrepancy: check for `pricing_missing=true` events and add missing model prices

View file

@ -0,0 +1,141 @@
---
title: AI Cost Tracker — Billing Units per Provider
tags: [reference, ai, cost-tracking, providers]
created: 2026-04-27
updated: 2026-04-27
---
# Billing Units per Provider
Reference for how each AI provider bills and how to extract usage data from their API responses.
## Gemini (Google AI / Vertex AI)
**Billing unit:** tokens (input + output separately)
**SDK:** `google-genai` Python SDK
**How to get usage:**
```python
response = await client.models.generate_content(...)
input_tokens = response.usage_metadata.prompt_token_count
output_tokens = response.usage_metadata.candidates_token_count
total_tokens = response.usage_metadata.total_token_count
```
> ⚠️ `usage_metadata` is available on all `generate_content` responses including multimodal (video + text prompts). It was **not being read** in video-accessibility before the cost-tracker integration — added as part of Phase 1.
**Token estimation before the call:**
- Text: `len(text) / 4` (rough heuristic; actual tokenisation varies ±30%)
- Video file: use Google's published token table:
- < 1 min video 1,0002,000 tokens + audio
- Exact: check `google.genai` file metadata after upload
- Image: ~258 tokens per 512×512 tile
**Pricing:** auto-synced from LiteLLM. See [[wiki/tech-patterns/cost-tracker-pricing-sources|cost-tracker-pricing-sources]].
---
## Gemini TTS (audio generation via generate_content)
**Billing unit:** tokens (output audio tokens, different rate from text)
**SDK:** same `google-genai`, with `response_modalities=["AUDIO"]`
**How to get usage:**
```python
response = await client.models.generate_content(
model="gemini-2.5-flash-preview-tts",
contents=...,
config=GenerateContentConfig(response_modalities=["AUDIO"]),
)
output_tokens = response.usage_metadata.candidates_token_count
```
Audio output token rate differs from text output rate — verify in LiteLLM for model `gemini-2.5-flash-preview-tts`.
---
## ElevenLabs TTS
**Billing unit:** characters (input text length)
**SDK:** custom HTTP (`aiohttp` POST to `https://api.elevenlabs.io/v1/text-to-speech/{voice_id}`)
**Response:** returns raw audio bytes. **No usage metadata in response.**
**How to measure:** compute `len(text)` at the call site **before** making the request:
```python
char_count = len(text)
# make the ElevenLabs call
await ct.record(..., chars=char_count, model="eleven_multilingual_v2", ...)
```
**Subscription vs pay-as-you-go:** ElevenLabs bills against a monthly character quota. When quota is exceeded, pay-as-you-go rate applies. The cost-tracker assumes pay-as-you-go for all characters (conservative upper bound). Adjust via admin override if on a subscription plan.
---
## Google Cloud TTS
**Billing unit:** characters (input text length, after SSML stripping)
**SDK:** `google.cloud.texttospeech` Python SDK
**Response:** `SynthesizeSpeechResponse` with `audio_content` (bytes). **No character count in response.**
**How to measure:**
```python
char_count = len(synthesis_input.text)
# for SSML Google bills stripped char count — approximate with len(ssml)
await ct.record(..., chars=char_count, model="standard", ...)
```
**Voice tiers and pricing:**
| Voice type | Billing model name | Price per 1M chars |
|---|---|---|
| Standard | `google_tts/standard` | $4.00 |
| WaveNet | `google_tts/wavenet` | $16.00 |
| Neural2 | `google_tts/neural2` | $16.00 |
| Studio | `google_tts/studio` | $160.00 |
Defined in `pricing/models.yaml` in the cost-tracker repo.
---
## OpenAI (future)
**Billing unit:** tokens (input + output)
```python
response = client.chat.completions.create(...)
input_tokens = response.usage.prompt_tokens
output_tokens = response.usage.completion_tokens
```
Auto-synced by LiteLLM.
---
## Anthropic Claude (future)
**Billing unit:** tokens (input + output)
```python
response = client.messages.create(...)
input_tokens = response.usage.input_tokens
output_tokens = response.usage.output_tokens
```
Auto-synced by LiteLLM.
---
## Whisper (self-hosted)
**Not billed per token.** Runs on Cloud Run / GPU compute.
Billing = infrastructure cost (compute time). Phase 1 does not track this.
Future Phase 2: track `audio_duration_seconds` and approximate cost from Cloud Run billing data.