From 41e0ee3ea1aaa7ef4faa0ef99066f196414d8e6e Mon Sep 17 00:00:00 2001 From: Vadym Samoilenko Date: Mon, 27 Apr 2026 11:11:54 +0100 Subject: [PATCH] vault backup: 2026-04-27 11:11:54 --- wiki/_master-index.md | 5 + wiki/architecture/_index.md | 3 +- wiki/architecture/ai-cost-tracker.md | 78 ++++++ wiki/concepts/_index.md | 5 + wiki/concepts/lazy-user-mirror.md | 58 +++++ wiki/concepts/litellm-pricing-source.md | 59 +++++ wiki/concepts/preflight-record-pattern.md | 59 +++++ wiki/concepts/sync-with-outbox.md | 67 +++++ wiki/projects-overview/ai-cost-tracker.md | 50 ++++ wiki/tech-patterns/_index.md | 3 + .../tech-patterns/cost-tracker-integration.md | 232 ++++++++++++++++++ .../cost-tracker-pricing-sources.md | 131 ++++++++++ wiki/tech-patterns/cost-tracker-providers.md | 141 +++++++++++ 13 files changed, 890 insertions(+), 1 deletion(-) create mode 100644 wiki/architecture/ai-cost-tracker.md create mode 100644 wiki/concepts/lazy-user-mirror.md create mode 100644 wiki/concepts/litellm-pricing-source.md create mode 100644 wiki/concepts/preflight-record-pattern.md create mode 100644 wiki/concepts/sync-with-outbox.md create mode 100644 wiki/projects-overview/ai-cost-tracker.md create mode 100644 wiki/tech-patterns/cost-tracker-integration.md create mode 100644 wiki/tech-patterns/cost-tracker-pricing-sources.md create mode 100644 wiki/tech-patterns/cost-tracker-providers.md diff --git a/wiki/_master-index.md b/wiki/_master-index.md index 1357e95..de2f0de 100644 --- a/wiki/_master-index.md +++ b/wiki/_master-index.md @@ -36,5 +36,10 @@ This 3-hop pattern works for hundreds of articles without vector search. | [[wiki/reports/_index\|reports/]] | Weekly and monthly knowledge base summaries | 0 | | [[wiki/infrastructure/_index\|infrastructure/]] | Server inventory: all 10 SSH hosts — optical, optical-dev, baic, librechat, modocmms, box-cli, aimpress, pve | 9 | +| [[wiki/architecture/ai-cost-tracker\|architecture: ai-cost-tracker]] | Shared AI cost tracking service — architecture, Workspace→Team→Project, preflight+record SDK, LiteLLM pricing | 1 | +| [[wiki/tech-patterns/cost-tracker-integration\|tech-patterns: cost-tracker-integration]] | Integration playbook for any Oliver project connecting to ai-cost-tracker (9-step guide + troubleshooting) | 1 | +| [[wiki/tech-patterns/cost-tracker-pricing-sources\|tech-patterns: cost-tracker-pricing-sources]] | Three-layer pricing pipeline: LiteLLM > YAML > override; historical effective_from/to | 1 | +| [[wiki/tech-patterns/cost-tracker-providers\|tech-patterns: cost-tracker-providers]] | Billing units per provider: Gemini usage_metadata, ElevenLabs/GCP TTS len(text), future OpenAI/Anthropic | 1 | + diff --git a/wiki/architecture/_index.md b/wiki/architecture/_index.md index 6fd2acc..9961765 100644 --- a/wiki/architecture/_index.md +++ b/wiki/architecture/_index.md @@ -29,4 +29,5 @@ Cross-cutting architectural decisions that appear in multiple Oliver projects. 5. **DEV_AUTH_BYPASS** — skip Azure AD in local dev, always use real auth in production -| [[wiki/architecture/optical-dev-server-deploy\|optical-dev-server-deploy]] | optical-dev GCP server: single-vhost Apache, Include pattern, port table, deploy script cache | Barclays Banner Builder, all Oliver projects | \ No newline at end of file +| [[wiki/architecture/optical-dev-server-deploy\|optical-dev-server-deploy]] | optical-dev GCP server: single-vhost Apache, Include pattern, port table, deploy script cache | Barclays Banner Builder, all Oliver projects | +| [[wiki/architecture/ai-cost-tracker\|ai-cost-tracker]] | Shared AI cost tracking service: Workspace→Team→Project, LiteLLM pricing, preflight+record SDK, hard limits | All Oliver projects | \ No newline at end of file diff --git a/wiki/architecture/ai-cost-tracker.md b/wiki/architecture/ai-cost-tracker.md new file mode 100644 index 0000000..7ed555c --- /dev/null +++ b/wiki/architecture/ai-cost-tracker.md @@ -0,0 +1,78 @@ +--- +title: AI Cost Tracker — Architecture +tags: [architecture, ai, cost-tracking, oliver-platform] +created: 2026-04-27 +updated: 2026-04-27 +--- + +# AI Cost Tracker + +Centralised **shared service** that tracks AI API spend across all Oliver projects. Every project that calls Gemini, ElevenLabs, Google Cloud TTS, or other paid AI APIs sends usage events here. + +## Why it exists + +Oliver runs multiple projects (video-accessibility, One2Edit, Box-pipelines …) all consuming paid AI APIs. Without centralised tracking: no visibility into total spend, no per-client cost attribution, no budget enforcement. + +## Architecture diagram + +``` +┌──────────────────────────────────┐ ┌────────────────────────────────────┐ +│ Any Oliver Project │ │ ai-cost-tracker │ +│ (e.g. video-accessibility) │ POST │ git@bitbucket.org:zlalani/... │ +│ │ /v1/ │ │ +│ AI call sites │ preflt │ FastAPI + MongoDB + Redis + React │ +│ (Gemini, ElevenLabs, GCP TTS) │────────►│ POST /v1/preflight │ +│ │ │ │ POST /v1/usage/record │ +│ oliver_cost_tracker SDK │ │ POST /v1/users/upsert │ +│ - preflight(estimate) │◄────────│ POST /v1/projects/upsert │ +│ - record(actual) │ │ │ +│ - SQLite outbox + retry │ │ Admin UI (Microsoft SSO) │ +└──────────────────────────────────┘ │ Workspaces / Teams / Projects │ + │ Pricing + LiteLLM auto-sync │ + │ Budgets + email alerts │ + │ Dashboard + Pivot analytics │ + └────────────────────────────────────┘ +``` + +## Key design decisions + +| Decision | Choice | Why | +|---|---|---| +| Deployment | Separate repo + separate server | Clean isolation, independent scaling | +| Org hierarchy | Workspace → Team → Project | Matches Oliver agency structure | +| User ownership | Each project owns users; lazy mirror in shared | No SSO migration needed | +| Pricing | LiteLLM auto-sync + YAML (non-LLM) + admin override | Auto-updated for LLMs, manual for chars | +| Transport | Sync HTTP + SQLite outbox fallback | Never breaks the AI pipeline | +| Budget enforcement | Hard limits via preflight check | `allow=false` before call is made | +| Auth (projects) | API key per project | Simple, revocable, auditable | +| Auth (admins) | Microsoft SSO | Consistent with all Oliver projects | + +## Org hierarchy + +``` +Workspace (e.g. "Ford", "H&M", "Oliver Internal") + └── Team (e.g. "Video", "Localization", "QC") + └── Project (e.g. "Ford Q3 Campaign", "H&M Spring 2026") + └── Job / event (individual AI call) +``` + +Users live in each project (video-accessibility, etc.) and are **lazily mirrored** into the cost-tracker when their first usage event arrives. + +## Tech stack + +Mirrors video-accessibility for team familiarity: +- Backend: **FastAPI + MongoDB Atlas + Redis + Celery** +- Frontend: **React 18 + Vite (TypeScript)** +- Auth admin: **Microsoft AAD (MSAL)** +- Charts: **recharts** +- Tables/pivot: **@tanstack/react-table** + +## Related articles + +- [[wiki/tech-patterns/cost-tracker-integration|cost-tracker-integration]] — how to connect a new project +- [[wiki/tech-patterns/cost-tracker-pricing-sources|cost-tracker-pricing-sources]] — how pricing is maintained +- [[wiki/tech-patterns/cost-tracker-providers|cost-tracker-providers]] — billing units per AI provider +- [[wiki/concepts/preflight-record-pattern|preflight-record-pattern]] — the core usage-tracking pattern +- [[wiki/concepts/lazy-user-mirror|lazy-user-mirror]] — how user sync works +- [[wiki/concepts/sync-with-outbox|sync-with-outbox]] — resilient HTTP calls with SQLite fallback +- [[wiki/projects-overview/ai-cost-tracker|ai-cost-tracker (project card)]] — project registry card diff --git a/wiki/concepts/_index.md b/wiki/concepts/_index.md index 8dfac24..5e59d1d 100644 --- a/wiki/concepts/_index.md +++ b/wiki/concepts/_index.md @@ -50,5 +50,10 @@ | [[wiki/concepts/fish-fisher-conf-d-conflict]] | Fisher plugin manager conflict when conf.d/ files are manually copied — delete and run fisher update to reinstall cleanly | daily/2026-04-22.md | 2026-04-22 | | [[wiki/concepts/macos-python-version-hooks]] | macOS system /usr/bin/python3 = 3.9; Path\|None syntax requires 3.10+; Claude Code hooks must use /opt/homebrew/bin/python3 | daily/2026-04-22.md | 2026-04-22 | +| [[wiki/concepts/preflight-record-pattern]] | preflight(estimate) → AI call → record(actual) — the 3-step AI cost-tracking pattern with budget enforcement | ai-cost-tracker | 2026-04-27 | +| [[wiki/concepts/lazy-user-mirror]] | User mirror created on first AI event, not on user creation — minimal integration surface, source project stays owner | ai-cost-tracker | 2026-04-27 | +| [[wiki/concepts/sync-with-outbox]] | Sync HTTP + SQLite outbox: record() never blocks the AI pipeline; background flusher retries up to 10x | ai-cost-tracker | 2026-04-27 | +| [[wiki/concepts/litellm-pricing-source]] | LiteLLM model_prices JSON as auto-updating LLM price source — why scraping provider sites is fragile | ai-cost-tracker | 2026-04-27 | + diff --git a/wiki/concepts/lazy-user-mirror.md b/wiki/concepts/lazy-user-mirror.md new file mode 100644 index 0000000..d950bd3 --- /dev/null +++ b/wiki/concepts/lazy-user-mirror.md @@ -0,0 +1,58 @@ +--- +title: Lazy User Mirror +tags: [concept, architecture, cost-tracking, users] +created: 2026-04-27 +updated: 2026-04-27 +--- + +# Lazy User Mirror + +The cost-tracker maintains a read-only mirror of users from each connected project. "Lazy" means the mirror record is created **on first use**, not when the user is created in the source project. + +## Why lazy (not eager)? + +- Source projects own their users; we don't want to add mandatory webhooks to every user CRUD operation +- A user who never triggers an AI call doesn't need to exist in the mirror +- Keeps integration surface minimal — just send `user_external_id` with every event + +## How it works + +1. When `POST /v1/preflight` or `POST /v1/usage/record` arrives with a `user_external_id` not yet in `users_mirror` → cost-tracker **auto-creates** a mirror record using `email`, `full_name`, and `role` from the request payload. + +2. The SDK sends these on every call: + ```python + await ct.preflight( + user_external_id=user.id, + user_email=user.email, # for mirror creation + user_full_name=user.full_name, # for analytics display + user_role=user.role, # for RBAC in analytics + ... + ) + ``` + +3. `users_mirror` record fields: `source_app`, `external_id`, `email`, `full_name`, `role`, `workspace_id` (from project lookup), `team_id`, `first_seen_at`, `last_seen_at`. + +4. Subsequent calls with the same `user_external_id` update `last_seen_at` only — no overwrite of profile fields. Profile changes are pushed via explicit `POST /v1/users/upsert`. + +## Explicit sync for profile changes + +If a user's email, role, or name changes in the source project: + +```python +await ct.upsert_user( + external_id=user.id, + email=user.email, + full_name=user.full_name, + role=user.role, + project_external_id=..., +) +``` + +In video-accessibility this is called from: +- `POST /admin/users` — on user creation (`routes_admin.py:101`) +- Login flow — on every successful login (keeps mirror fresh) + +## Analytics implications + +- Users without any AI events never appear in the mirror → analytics shows only users who actually consumed AI +- Orphan users (no `project_external_id` provided) are mirrored but show `workspace_id=null` until assigned via bulk action in admin UI diff --git a/wiki/concepts/litellm-pricing-source.md b/wiki/concepts/litellm-pricing-source.md new file mode 100644 index 0000000..1bd3fb5 --- /dev/null +++ b/wiki/concepts/litellm-pricing-source.md @@ -0,0 +1,59 @@ +--- +title: LiteLLM as Pricing Source +tags: [concept, ai, cost-tracking, pricing, llm] +created: 2026-04-27 +updated: 2026-04-27 +--- + +# LiteLLM as Pricing Source + +The AI Cost Tracker uses the open-source **LiteLLM model prices JSON** as the primary source of truth for LLM pricing. This eliminates the need to scrape provider websites. + +## What is LiteLLM? + +[LiteLLM](https://github.com/BerriAI/litellm) is an open-source Python library (30k+ GitHub stars) for calling 100+ LLM providers with a unified interface. It maintains a community-curated `model_prices_and_context_window.json` covering Gemini, OpenAI, Anthropic, Cohere, Mistral, Together AI, and many others. + +## Why not scrape provider websites directly? + +| Problem | Impact | +|---|---| +| Pricing pages are React SPAs | Need headless browser; brittle | +| Layout changes without notice | Breaks silently; wrong costs logged | +| Different billing units per provider | Complex parsing; easy to get wrong | +| Tier/volume discounts in HTML | Nearly impossible to parse reliably | +| ToS may prohibit scraping | Legal risk | + +LiteLLM maintains all of this in a single structured JSON — battle-tested by thousands of production deployments. + +## The JSON structure + +```json +{ + "gemini/gemini-3-pro-preview": { + "input_cost_per_token": 0.00000125, + "output_cost_per_token": 0.000005, + "litellm_provider": "google", + "mode": "chat", + "max_tokens": 65536 + } +} +``` + +## What LiteLLM does NOT cover + +- **ElevenLabs** — not an LLM; character-based billing +- **Google Cloud TTS** — not an LLM; character-based billing +- **Self-hosted models** — no external billing + +These are defined in `pricing/models.yaml` in the cost-tracker repo. See [[wiki/tech-patterns/cost-tracker-pricing-sources|cost-tracker-pricing-sources]]. + +## Keeping prices accurate + +1. Daily Celery beat task (`tasks/pricing_sync.py`) fetches the latest JSON +2. If a price changes → admin gets notified; new price record created with `effective_from=today` +3. Old price records kept forever for historical reporting +4. To freeze at a known-good version: set `LITELLM_COMMIT_HASH` env var + +## The alternative considered + +Direct website scraping was evaluated and rejected due to the problems listed above. LiteLLM is the standard community solution for this exact use case. diff --git a/wiki/concepts/preflight-record-pattern.md b/wiki/concepts/preflight-record-pattern.md new file mode 100644 index 0000000..c1511b6 --- /dev/null +++ b/wiki/concepts/preflight-record-pattern.md @@ -0,0 +1,59 @@ +--- +title: Preflight + Record Pattern +tags: [concept, ai, cost-tracking, patterns] +created: 2026-04-27 +updated: 2026-04-27 +--- + +# Preflight + Record Pattern + +The core usage-tracking pattern used by the AI Cost Tracker SDK. Every paid AI call follows the same three steps. + +## The pattern + +``` +preflight(estimated_units) → call AI → record(actual_units) +``` + +1. **Preflight** — before the AI call, ask the cost-tracker: "Is this workspace/project within budget?" + - Input: model name + estimated units (tokens, chars, etc.) + - Output: `allow=true/false`, estimated cost, `request_id` + - If `allow=false` → raise `BudgetExceeded` before calling the AI API + +2. **AI call** — the actual paid API call (unmodified) + +3. **Record** — after the call, report actual usage + - Input: `request_id` from preflight + actual units from response + - Output: `event_id`, `cost_usd` + - If cost-tracker is unavailable → SDK saves to SQLite outbox and retries in background (see [[wiki/concepts/sync-with-outbox|sync-with-outbox]]) + +## Why two steps? + +- **Preflight** enables hard budget enforcement **before** money is spent +- **Record** captures accurate actual usage (estimated ≠ actual for output tokens) +- Decoupling protects the AI pipeline: if cost-tracker goes down after preflight, `record()` still succeeds via outbox + +## Estimation accuracy + +Preflight uses estimated units because output token count is unknown before the call: + +| Provider | What we estimate | Accuracy | +|---|---|---| +| Gemini text | input tokens (`len/4`), output tokens (caller hint) | ±30% | +| Gemini video | input tokens (file-size table), output tokens (hint) | ±50% | +| ElevenLabs | chars (exact — `len(text)`) | 100% | +| Google TTS | chars (exact — `len(text)`) | 100% | + +Over-estimation is better than under-estimation for budget enforcement. If you consistently over-estimate by 50%, tune the default `estimated_output_tokens` hint downward. + +## Hard limit mechanics + +- Preflight computes `current_month_spend + estimated_cost` +- If this exceeds `budget.amount_usd` AND `budget.hard_limit=True` → `allow=false` +- The budget check is **eventual** (reads from pre-aggregated rollups + today's raw events), not transactional — brief overage is possible under high concurrency +- This is acceptable for AI cost tracking: exact-to-the-cent enforcement would require distributed locks and add unacceptable latency + +## Projects using this pattern + +- [[wiki/projects-overview/ai-cost-tracker|ai-cost-tracker]] (defines the pattern) +- video-accessibility (first consumer, Phase 1) diff --git a/wiki/concepts/sync-with-outbox.md b/wiki/concepts/sync-with-outbox.md new file mode 100644 index 0000000..8f7a19e --- /dev/null +++ b/wiki/concepts/sync-with-outbox.md @@ -0,0 +1,67 @@ +--- +title: Sync HTTP + SQLite Outbox Pattern +tags: [concept, architecture, resilience, cost-tracking] +created: 2026-04-27 +updated: 2026-04-27 +--- + +# Sync HTTP + SQLite Outbox Pattern + +The pattern used by the cost-tracker SDK to ensure usage events are **never lost** even if the cost-tracker service is temporarily unavailable. + +## The problem + +The AI pipeline (Celery worker) calls `ct.record(...)` after each AI API call. If the cost-tracker service is down, a naive implementation would either: +- Silently drop the event (cost data lost) +- Raise an exception (AI pipeline fails — unacceptable) + +## The solution + +``` +record() → try POST /v1/usage/record + ├── success → done + └── failure (timeout / 5xx / network) → save to SQLite outbox + ↓ + background flusher (every 30s) + retries all pending events with + exponential backoff +``` + +## Implementation details + +**SQLite outbox** (one file per worker, default `/tmp/cost_outbox.sqlite`): +- Schema: `(id, ts, payload_json, attempts, last_attempt_at, status)` +- Written synchronously before returning from `record()` on failure +- Never blocks the AI pipeline + +**Background flusher** (asyncio background task): +- Starts when `CostTracker` is initialised +- Every 30 seconds: reads all `status='pending'` rows, retries `POST /v1/usage/record` +- On success: marks `status='sent'` +- After 10 failed attempts: marks `status='dead'`, logs warning → human investigation needed + +**Graceful degradation:** +- `record()` never raises `CostTrackerUnavailable` — it's fire-and-forget via outbox +- `preflight()` returns `allow=true` on connectivity failure by default (`fail_open=True`). Configurable. + +## Configuration + +```python +ct = CostTracker( + ... + outbox_path="/tmp/cost_outbox.sqlite", + flush_interval_seconds=30, + max_retry_attempts=10, + fail_open=True, # preflight returns allow=True when service unreachable +) +``` + +## Monitoring + +- Outbox depth reported in SDK's `/metrics` endpoint (if enabled) +- `dead` status rows require manual review — add to monitoring alert + +## Where this pattern applies + +- Any Oliver project using the `oliver-cost-tracker` SDK +- Generally applicable to any fire-and-forget side-effect call where data loss is unacceptable but the consumer must not block the main flow diff --git a/wiki/projects-overview/ai-cost-tracker.md b/wiki/projects-overview/ai-cost-tracker.md new file mode 100644 index 0000000..f19e865 --- /dev/null +++ b/wiki/projects-overview/ai-cost-tracker.md @@ -0,0 +1,50 @@ +--- +title: AI Cost Tracker — Project Card +tags: [projects, oliver-platform, ai, cost-tracking] +created: 2026-04-27 +updated: 2026-04-27 +--- + +# AI Cost Tracker + +| Field | Value | +|---|---| +| **Repo** | `git@bitbucket.org:zlalani/ai-cost-tracker.git` | +| **Stack** | FastAPI + MongoDB Atlas + Redis + Celery + React 18 + Vite | +| **Auth (admin)** | Microsoft AAD (MSAL) | +| **Auth (API)** | API key per connected project | +| **Status** | Phase 1 — building (April 2026) | +| **First consumer** | video-accessibility | + +## What it does + +Centralised AI cost tracking for all Oliver projects. Every project sends preflight + record events; cost-tracker aggregates, stores, and presents analytics by workspace / team / project / user / model. + +See [[wiki/architecture/ai-cost-tracker|ai-cost-tracker architecture]] for full architecture details. + +## Key URLs + +- Admin UI: TBD (e.g. `https://cost.oliver.agency`) +- API base: `https://cost.oliver.agency/v1` +- Health: `https://cost.oliver.agency/v1/health` + +## Connected projects + +| Project | Source app name | Connected since | +|---|---|---| +| video-accessibility | `video-accessibility` | Phase 1 (April 2026) | + +## How to connect a new project + +See [[wiki/tech-patterns/cost-tracker-integration|cost-tracker-integration]] — full step-by-step guide. + +## Related articles + +- [[wiki/architecture/ai-cost-tracker|Architecture overview]] +- [[wiki/tech-patterns/cost-tracker-integration|Integration guide (new projects)]] +- [[wiki/tech-patterns/cost-tracker-pricing-sources|Pricing sources]] +- [[wiki/tech-patterns/cost-tracker-providers|Provider billing units]] +- [[wiki/concepts/preflight-record-pattern|Preflight + Record pattern]] +- [[wiki/concepts/lazy-user-mirror|Lazy user mirror]] +- [[wiki/concepts/sync-with-outbox|Sync HTTP + outbox pattern]] +- [[wiki/concepts/litellm-pricing-source|LiteLLM as pricing source]] diff --git a/wiki/tech-patterns/_index.md b/wiki/tech-patterns/_index.md index 98d9064..3907a76 100644 --- a/wiki/tech-patterns/_index.md +++ b/wiki/tech-patterns/_index.md @@ -24,6 +24,9 @@ Recurring technology stacks used across Oliver Agency projects. Each article cov | [[wiki/tech-patterns/one2edit-api\|one2edit-api]] | One2Edit translation platform API | 3M Portal, H&M O2E Tool | | [[wiki/tech-patterns/nodejs-vanilla-proxy\|nodejs-vanilla-proxy]] | Node.js + Vanilla JS lightweight proxy tools | 3M Portal, Ferrero, Homepage | | [[wiki/tech-patterns/kling-veo-video-api\|kling-veo-video-api]] | Kling AI + Google Veo 3.1 video generation — camera control, I2V, polling | Cinema Studio Pro | +| [[wiki/tech-patterns/cost-tracker-integration\|cost-tracker-integration]] | Step-by-step guide: connect any Oliver project to ai-cost-tracker (API key, SDK install, wrap AI calls, budgets) | All Oliver projects | +| [[wiki/tech-patterns/cost-tracker-pricing-sources\|cost-tracker-pricing-sources]] | Three-layer pricing pipeline: LiteLLM auto-sync > YAML (non-LLM) > admin override; historical effective_from/to | ai-cost-tracker | +| [[wiki/tech-patterns/cost-tracker-providers\|cost-tracker-providers]] | Billing units per AI provider: Gemini tokens (usage_metadata), ElevenLabs chars, Google TTS chars | All AI projects | ## Quick Decision Guide diff --git a/wiki/tech-patterns/cost-tracker-integration.md b/wiki/tech-patterns/cost-tracker-integration.md new file mode 100644 index 0000000..0689b67 --- /dev/null +++ b/wiki/tech-patterns/cost-tracker-integration.md @@ -0,0 +1,232 @@ +--- +title: AI Cost Tracker — Integrating a New Project +tags: [how-to, ai, cost-tracking, integration] +created: 2026-04-27 +updated: 2026-04-27 +--- + +# Integrating a New Project with AI Cost Tracker + +Step-by-step guide for connecting any Oliver backend project to the shared cost-tracker. + +## Prerequisites + +- You have a Bitbucket account with access to `zlalani/ai-cost-tracker` +- You can reach the cost-tracker admin UI (ask for the domain) +- Your project is a FastAPI + Python backend (adaptable for other stacks) + +--- + +## Step 1 — Get an API key + +1. Open the cost-tracker Admin UI → **API Keys** → **Create key** +2. Name it after your project (e.g. `video-accessibility-prod`) +3. Scope: `preflight`, `record`, `upsert` +4. Copy the key — it is shown **only once** +5. Store in your project's GCP Secret Manager (same pattern as `GEMINI_API_KEY`) + +--- + +## Step 2 — Install the SDK + +```bash +pip install oliver-cost-tracker +# or, while the private package isn't published yet: +# git submodule add git@bitbucket.org:zlalani/ai-cost-tracker.git vendor/cost-tracker +# pip install -e vendor/cost-tracker/sdk/ +``` + +--- + +## Step 3 — Add environment variables + +```env +COST_TRACKER_BASE_URL=https://cost.oliver.agency +COST_TRACKER_API_KEY=ct_live_xxxxxxxxxxxxxxxxxxxx +COST_TRACKER_SOURCE_APP=video-accessibility +COST_TRACKER_OUTBOX_PATH=/tmp/cost_outbox.sqlite +``` + +--- + +## Step 4 — Initialise the client + +In `core/dependencies.py` (FastAPI): + +```python +from oliver_cost_tracker import CostTracker +from functools import lru_cache + +@lru_cache +def get_cost_tracker() -> CostTracker: + return CostTracker( + base_url=settings.cost_tracker_base_url, + api_key=settings.cost_tracker_api_key, + source_app=settings.cost_tracker_source_app, + outbox_path=settings.cost_tracker_outbox_path, + ) +``` + +In Celery workers, instantiate once at module level: + +```python +cost_tracker = CostTracker( + base_url=settings.cost_tracker_base_url, + api_key=settings.cost_tracker_api_key, + source_app="video-accessibility", + outbox_path="/tmp/cost_outbox.sqlite", +) +``` + +--- + +## Step 5 — Wrap AI calls (preflight → call → record) + +This is the **core pattern**. Every paid AI call follows three steps. +See [[wiki/concepts/preflight-record-pattern|preflight-record-pattern]] for the full explanation. + +```python +from oliver_cost_tracker import CostTracker, BudgetExceeded +import time + +async def call_gemini_with_tracking( + ct: CostTracker, + prompt: str, + user_id: str, + job_id: str, + project_id: str | None = None, +) -> GenerateContentResponse: + + # 1. Preflight — checks budget, returns allow/deny + preflight = await ct.preflight( + user_external_id=user_id, + project_external_id=project_id, + job_external_id=job_id, + model="gemini-3-pro-preview", + estimated_input_tokens=ct.estimate_tokens(prompt), + estimated_output_tokens=2048, # conservative overestimate + ) + if not preflight.allow: + raise BudgetExceeded(preflight.deny_reason) + + # 2. AI call + t0 = time.monotonic() + response = await client.models.generate_content( + model="gemini-3-pro-preview", + contents=prompt, + ) + elapsed_ms = int((time.monotonic() - t0) * 1000) + + # 3. Record actual usage + await ct.record( + request_id=preflight.request_id, + user_external_id=user_id, + project_external_id=project_id, + job_external_id=job_id, + model="gemini-3-pro-preview", + input_tokens=response.usage_metadata.prompt_token_count, + output_tokens=response.usage_metadata.candidates_token_count, + latency_ms=elapsed_ms, + status="success", + ) + return response +``` + +For providers without usage metadata (ElevenLabs, Google Cloud TTS), compute units from input: + +```python +await ct.record( + ... + model="eleven_multilingual_v2", + chars=len(text), # billing unit is characters + latency_ms=elapsed_ms, + status="success", +) +``` + +See [[wiki/tech-patterns/cost-tracker-providers|cost-tracker-providers]] for all provider details. + +--- + +## Step 6 — Attribution in background tasks (Celery) + +Celery tasks typically receive only `job_id`. Fetch `user_id` and `project_id` from the job document: + +```python +@celery_app.task +async def my_ai_task(job_id: str): + job = await db.jobs.find_one({"_id": job_id}, {"client_id": 1, "project_id": 1}) + user_id = job["client_id"] + project_id = job.get("project_id") # may be None for old jobs + + result = await call_gemini_with_tracking( + ct=cost_tracker, + prompt=build_prompt(job), + user_id=user_id, + job_id=job_id, + project_id=project_id, + ) +``` + +For per-cue Celery tasks (like `tts_synthesis.py`), add `user_id` and `project_id` to the task kwargs to avoid an extra DB fetch per cue. + +--- + +## Step 7 — Create Workspace / Team / Project in Admin UI + +Before going live, set up the org structure in the cost-tracker Admin UI: + +1. **Workspace** → Create (e.g. "Ford", "H&M", or "video-accessibility" for internal use) +2. **Team** → Create under the workspace (e.g. "Video Production") +3. **Project** → Create under the team, set `source_app` = your project name and `external_id` = the ID your project will send in `project_external_id` + +> Jobs sent before a project is created appear as **Unassigned** in the dashboard. You can reassign them bulk later. + +--- + +## Step 8 — Set budgets and alerts + +In Admin UI → **Budgets** → Create: + +- `scope_type`: workspace / team / project +- `amount_usd`: monthly limit +- `alert_thresholds`: [0.5, 0.8, 1.0] (email at 50%, 80%, 100%) +- `hard_limit`: true = preflight returns `allow=false` when exceeded + +--- + +## Step 9 — Smoke test + +```bash +# 1. Check service is up +curl https://cost.oliver.agency/v1/health + +# 2. Preflight +curl -X POST https://cost.oliver.agency/v1/preflight \ + -H "X-API-Key: ct_live_xxx" \ + -H "Content-Type: application/json" \ + -d '{"user_external_id":"test-user","model":"gemini-3-pro-preview","estimated_units":{"input_tokens":1000,"output_tokens":200}}' +# expect: {"allow":true,"estimated_cost_usd":...,"request_id":"..."} + +# 3. Record +curl -X POST https://cost.oliver.agency/v1/usage/record \ + -H "X-API-Key: ct_live_xxx" \ + -H "Content-Type: application/json" \ + -d '{"request_id":"","user_external_id":"test-user","model":"gemini-3-pro-preview","units":{"input_tokens":987,"output_tokens":180},"latency_ms":1200,"status":"success"}' +# expect: {"event_id":"...","cost_usd":0.00214} +``` + +Then open the Admin UI Dashboard — the test event should appear within seconds. + +--- + +## Troubleshooting + +| Symptom | Cause | Fix | +|---|---|---| +| `preflight` returns 401 | Wrong or missing API key | Check `COST_TRACKER_API_KEY` env var; verify key is active in Admin UI | +| `preflight` returns `allow=false` | Budget exceeded | Admin UI → Budgets → raise limit or wait for next billing period | +| Events not appearing in dashboard | Outbox accumulating (service down) | Check `/tmp/cost_outbox.sqlite`; service auto-retries when back up | +| `cost_usd=null` on events | Model not in pricing table | Admin UI → Pricing → add model, or check LiteLLM sync task ran | +| Slow preflight (>500ms) | cost-tracker under load or network | SDK retries automatically; if persistent, check service metrics | +| Token estimates wildly off | Char/4 heuristic for video prompts | Gemini video needs file_size-based lookup; see [[wiki/tech-patterns/cost-tracker-providers|cost-tracker-providers]] | diff --git a/wiki/tech-patterns/cost-tracker-pricing-sources.md b/wiki/tech-patterns/cost-tracker-pricing-sources.md new file mode 100644 index 0000000..93faee9 --- /dev/null +++ b/wiki/tech-patterns/cost-tracker-pricing-sources.md @@ -0,0 +1,131 @@ +--- +title: AI Cost Tracker — Pricing Sources +tags: [how-to, ai, cost-tracking, pricing] +created: 2026-04-27 +updated: 2026-04-27 +--- + +# AI Cost Tracker — Pricing Sources + +The cost-tracker uses a **three-layer hybrid pricing pipeline**. Understanding the priority order is essential for accurate billing attribution. + +## Priority order + +``` +override (highest — set manually by admin) + yaml (fallback — versioned in repo for non-LLM providers) + litellm (lowest — auto-synced daily from open source) +``` + +`compute_cost(provider, model, units, ts)` returns the cost using the **highest-priority active price** for the given model at timestamp `ts`. + +--- + +## Layer 1 — LiteLLM auto-sync (LLM providers) + +**Source:** `https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json` + +**Coverage:** Gemini, OpenAI, Anthropic, Cohere, Mistral, Together, and 100+ others. + +**Sync schedule:** Celery beat task `tasks/pricing_sync.py` runs **daily at 02:00 UTC**. + +**What happens on sync:** +1. Fetches the JSON (pinned to a configurable commit hash in `LITELLM_COMMIT_HASH` env var) +2. Maps `input_cost_per_token` / `output_cost_per_token` to our schema +3. For each model: + - If no existing price → creates new `model_prices` record with `source="litellm"` + - If price unchanged → updates `litellm_commit_hash`, no other change + - If price **changed** → closes old record (`effective_to=today`), creates new record, sends **admin notification email** + +> **Note:** Auto-price changes never silently modify `source="override"` records. If you have an override active, the sync logs a divergence warning but leaves your override intact. + +**To pin a specific version** (for reproducibility): +```env +LITELLM_COMMIT_HASH=abc123def456 # pin to a known-good commit +``` + +See [[wiki/concepts/litellm-pricing-source|litellm-pricing-source]] for deeper explanation. + +--- + +## Layer 2 — YAML (non-LLM providers) + +**File:** `backend/app/pricing/models.yaml` — versioned in the cost-tracker repo. + +Contains providers that LiteLLM does not cover: + +```yaml +# ElevenLabs +- provider: elevenlabs + model: eleven_multilingual_v2 + billing_unit: char + price_per_1k_usd: 0.30 + effective_from: "2025-01-01" + +- provider: elevenlabs + model: eleven_flash_v2_5 + billing_unit: char + price_per_1k_usd: 0.11 + effective_from: "2025-01-01" + +# Google Cloud TTS +- provider: google_tts + model: standard + billing_unit: char + price_per_1m_usd: 4.00 + effective_from: "2024-01-01" + +- provider: google_tts + model: wavenet + billing_unit: char + price_per_1m_usd: 16.00 + effective_from: "2024-01-01" +``` + +**When to update YAML:** +- ElevenLabs raises/lowers per-char pricing +- Google Cloud TTS changes tier pricing +- Adding a brand-new non-LLM provider + +**How to update:** +1. Add a new entry with the new price and `effective_from: "YYYY-MM-DD"` +2. Leave the old entry — it is used for historical cost attribution +3. Deploy the new YAML → loader upserts on startup + +**Do NOT delete old entries** — they are needed for retroactive reports. + +--- + +## Layer 3 — Admin override (UI) + +Use when you have: +- A negotiated enterprise contract price (different from public pricing) +- A volume discount or committed-use agreement +- A temporary promotional rate +- A price correction before the next LiteLLM sync + +**How to create an override:** +1. Admin UI → **Pricing** → find the model → **Override price** +2. Set: `price_per_unit_usd`, `effective_from` (defaults to today), optional `override_reason` +3. Save → old price gets `effective_to=effective_from`, override is now active + +Override records are never auto-modified by LiteLLM sync. + +--- + +## Historical pricing and retroactive reports + +Every usage event is stored with `price_id` — a reference to the exact `model_prices` record active at the time of the call: + +- **Retroactive reports are always accurate** — changing a price today does not affect yesterday's costs +- Old `model_prices` records with `effective_to` set are never deleted +- Re-evaluating historical costs with new pricing = manual export + spreadsheet (not a built-in feature) + +--- + +## Monthly reconciliation + +Recommended monthly check: +1. Download invoice from Google Cloud Console / ElevenLabs dashboard +2. Compare with cost-tracker "Actual vs Billed" report (Admin UI → Analytics → Reconciliation) +3. If >5% discrepancy: check for `pricing_missing=true` events and add missing model prices diff --git a/wiki/tech-patterns/cost-tracker-providers.md b/wiki/tech-patterns/cost-tracker-providers.md new file mode 100644 index 0000000..be96686 --- /dev/null +++ b/wiki/tech-patterns/cost-tracker-providers.md @@ -0,0 +1,141 @@ +--- +title: AI Cost Tracker — Billing Units per Provider +tags: [reference, ai, cost-tracking, providers] +created: 2026-04-27 +updated: 2026-04-27 +--- + +# Billing Units per Provider + +Reference for how each AI provider bills and how to extract usage data from their API responses. + +## Gemini (Google AI / Vertex AI) + +**Billing unit:** tokens (input + output separately) + +**SDK:** `google-genai` Python SDK + +**How to get usage:** +```python +response = await client.models.generate_content(...) + +input_tokens = response.usage_metadata.prompt_token_count +output_tokens = response.usage_metadata.candidates_token_count +total_tokens = response.usage_metadata.total_token_count +``` + +> ⚠️ `usage_metadata` is available on all `generate_content` responses including multimodal (video + text prompts). It was **not being read** in video-accessibility before the cost-tracker integration — added as part of Phase 1. + +**Token estimation before the call:** +- Text: `len(text) / 4` (rough heuristic; actual tokenisation varies ±30%) +- Video file: use Google's published token table: + - < 1 min video ≈ 1,000–2,000 tokens + audio + - Exact: check `google.genai` file metadata after upload +- Image: ~258 tokens per 512×512 tile + +**Pricing:** auto-synced from LiteLLM. See [[wiki/tech-patterns/cost-tracker-pricing-sources|cost-tracker-pricing-sources]]. + +--- + +## Gemini TTS (audio generation via generate_content) + +**Billing unit:** tokens (output audio tokens, different rate from text) + +**SDK:** same `google-genai`, with `response_modalities=["AUDIO"]` + +**How to get usage:** +```python +response = await client.models.generate_content( + model="gemini-2.5-flash-preview-tts", + contents=..., + config=GenerateContentConfig(response_modalities=["AUDIO"]), +) +output_tokens = response.usage_metadata.candidates_token_count +``` + +Audio output token rate differs from text output rate — verify in LiteLLM for model `gemini-2.5-flash-preview-tts`. + +--- + +## ElevenLabs TTS + +**Billing unit:** characters (input text length) + +**SDK:** custom HTTP (`aiohttp` POST to `https://api.elevenlabs.io/v1/text-to-speech/{voice_id}`) + +**Response:** returns raw audio bytes. **No usage metadata in response.** + +**How to measure:** compute `len(text)` at the call site **before** making the request: + +```python +char_count = len(text) +# make the ElevenLabs call +await ct.record(..., chars=char_count, model="eleven_multilingual_v2", ...) +``` + +**Subscription vs pay-as-you-go:** ElevenLabs bills against a monthly character quota. When quota is exceeded, pay-as-you-go rate applies. The cost-tracker assumes pay-as-you-go for all characters (conservative upper bound). Adjust via admin override if on a subscription plan. + +--- + +## Google Cloud TTS + +**Billing unit:** characters (input text length, after SSML stripping) + +**SDK:** `google.cloud.texttospeech` Python SDK + +**Response:** `SynthesizeSpeechResponse` with `audio_content` (bytes). **No character count in response.** + +**How to measure:** +```python +char_count = len(synthesis_input.text) +# for SSML Google bills stripped char count — approximate with len(ssml) +await ct.record(..., chars=char_count, model="standard", ...) +``` + +**Voice tiers and pricing:** + +| Voice type | Billing model name | Price per 1M chars | +|---|---|---| +| Standard | `google_tts/standard` | $4.00 | +| WaveNet | `google_tts/wavenet` | $16.00 | +| Neural2 | `google_tts/neural2` | $16.00 | +| Studio | `google_tts/studio` | $160.00 | + +Defined in `pricing/models.yaml` in the cost-tracker repo. + +--- + +## OpenAI (future) + +**Billing unit:** tokens (input + output) + +```python +response = client.chat.completions.create(...) +input_tokens = response.usage.prompt_tokens +output_tokens = response.usage.completion_tokens +``` + +Auto-synced by LiteLLM. + +--- + +## Anthropic Claude (future) + +**Billing unit:** tokens (input + output) + +```python +response = client.messages.create(...) +input_tokens = response.usage.input_tokens +output_tokens = response.usage.output_tokens +``` + +Auto-synced by LiteLLM. + +--- + +## Whisper (self-hosted) + +**Not billed per token.** Runs on Cloud Run / GPU compute. + +Billing = infrastructure cost (compute time). Phase 1 does not track this. +Future Phase 2: track `audio_duration_seconds` and approximate cost from Cloud Run billing data.