vault backup: 2026-04-27 11:11:54
This commit is contained in:
parent
e128d9e58a
commit
41e0ee3ea1
13 changed files with 890 additions and 1 deletions
|
|
@ -36,5 +36,10 @@ This 3-hop pattern works for hundreds of articles without vector search.
|
|||
| [[wiki/reports/_index\|reports/]] | Weekly and monthly knowledge base summaries | 0 |
|
||||
| [[wiki/infrastructure/_index\|infrastructure/]] | Server inventory: all 10 SSH hosts — optical, optical-dev, baic, librechat, modocmms, box-cli, aimpress, pve | 9 |
|
||||
|
||||
| [[wiki/architecture/ai-cost-tracker\|architecture: ai-cost-tracker]] | Shared AI cost tracking service — architecture, Workspace→Team→Project, preflight+record SDK, LiteLLM pricing | 1 |
|
||||
| [[wiki/tech-patterns/cost-tracker-integration\|tech-patterns: cost-tracker-integration]] | Integration playbook for any Oliver project connecting to ai-cost-tracker (9-step guide + troubleshooting) | 1 |
|
||||
| [[wiki/tech-patterns/cost-tracker-pricing-sources\|tech-patterns: cost-tracker-pricing-sources]] | Three-layer pricing pipeline: LiteLLM > YAML > override; historical effective_from/to | 1 |
|
||||
| [[wiki/tech-patterns/cost-tracker-providers\|tech-patterns: cost-tracker-providers]] | Billing units per provider: Gemini usage_metadata, ElevenLabs/GCP TTS len(text), future OpenAI/Anthropic | 1 |
|
||||
|
||||
<!-- New topic folders added here automatically as they are created -->
|
||||
<!-- Format: | [[wiki/topic/_index\|topic/]] | One-line description | N articles | -->
|
||||
|
|
|
|||
|
|
@ -29,4 +29,5 @@ Cross-cutting architectural decisions that appear in multiple Oliver projects.
|
|||
5. **DEV_AUTH_BYPASS** — skip Azure AD in local dev, always use real auth in production
|
||||
|
||||
|
||||
| [[wiki/architecture/optical-dev-server-deploy\|optical-dev-server-deploy]] | optical-dev GCP server: single-vhost Apache, Include pattern, port table, deploy script cache | Barclays Banner Builder, all Oliver projects |
|
||||
| [[wiki/architecture/optical-dev-server-deploy\|optical-dev-server-deploy]] | optical-dev GCP server: single-vhost Apache, Include pattern, port table, deploy script cache | Barclays Banner Builder, all Oliver projects |
|
||||
| [[wiki/architecture/ai-cost-tracker\|ai-cost-tracker]] | Shared AI cost tracking service: Workspace→Team→Project, LiteLLM pricing, preflight+record SDK, hard limits | All Oliver projects |
|
||||
78
wiki/architecture/ai-cost-tracker.md
Normal file
78
wiki/architecture/ai-cost-tracker.md
Normal file
|
|
@ -0,0 +1,78 @@
|
|||
---
|
||||
title: AI Cost Tracker — Architecture
|
||||
tags: [architecture, ai, cost-tracking, oliver-platform]
|
||||
created: 2026-04-27
|
||||
updated: 2026-04-27
|
||||
---
|
||||
|
||||
# AI Cost Tracker
|
||||
|
||||
Centralised **shared service** that tracks AI API spend across all Oliver projects. Every project that calls Gemini, ElevenLabs, Google Cloud TTS, or other paid AI APIs sends usage events here.
|
||||
|
||||
## Why it exists
|
||||
|
||||
Oliver runs multiple projects (video-accessibility, One2Edit, Box-pipelines …) all consuming paid AI APIs. Without centralised tracking: no visibility into total spend, no per-client cost attribution, no budget enforcement.
|
||||
|
||||
## Architecture diagram
|
||||
|
||||
```
|
||||
┌──────────────────────────────────┐ ┌────────────────────────────────────┐
|
||||
│ Any Oliver Project │ │ ai-cost-tracker │
|
||||
│ (e.g. video-accessibility) │ POST │ git@bitbucket.org:zlalani/... │
|
||||
│ │ /v1/ │ │
|
||||
│ AI call sites │ preflt │ FastAPI + MongoDB + Redis + React │
|
||||
│ (Gemini, ElevenLabs, GCP TTS) │────────►│ POST /v1/preflight │
|
||||
│ │ │ │ POST /v1/usage/record │
|
||||
│ oliver_cost_tracker SDK │ │ POST /v1/users/upsert │
|
||||
│ - preflight(estimate) │◄────────│ POST /v1/projects/upsert │
|
||||
│ - record(actual) │ │ │
|
||||
│ - SQLite outbox + retry │ │ Admin UI (Microsoft SSO) │
|
||||
└──────────────────────────────────┘ │ Workspaces / Teams / Projects │
|
||||
│ Pricing + LiteLLM auto-sync │
|
||||
│ Budgets + email alerts │
|
||||
│ Dashboard + Pivot analytics │
|
||||
└────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Key design decisions
|
||||
|
||||
| Decision | Choice | Why |
|
||||
|---|---|---|
|
||||
| Deployment | Separate repo + separate server | Clean isolation, independent scaling |
|
||||
| Org hierarchy | Workspace → Team → Project | Matches Oliver agency structure |
|
||||
| User ownership | Each project owns users; lazy mirror in shared | No SSO migration needed |
|
||||
| Pricing | LiteLLM auto-sync + YAML (non-LLM) + admin override | Auto-updated for LLMs, manual for chars |
|
||||
| Transport | Sync HTTP + SQLite outbox fallback | Never breaks the AI pipeline |
|
||||
| Budget enforcement | Hard limits via preflight check | `allow=false` before call is made |
|
||||
| Auth (projects) | API key per project | Simple, revocable, auditable |
|
||||
| Auth (admins) | Microsoft SSO | Consistent with all Oliver projects |
|
||||
|
||||
## Org hierarchy
|
||||
|
||||
```
|
||||
Workspace (e.g. "Ford", "H&M", "Oliver Internal")
|
||||
└── Team (e.g. "Video", "Localization", "QC")
|
||||
└── Project (e.g. "Ford Q3 Campaign", "H&M Spring 2026")
|
||||
└── Job / event (individual AI call)
|
||||
```
|
||||
|
||||
Users live in each project (video-accessibility, etc.) and are **lazily mirrored** into the cost-tracker when their first usage event arrives.
|
||||
|
||||
## Tech stack
|
||||
|
||||
Mirrors video-accessibility for team familiarity:
|
||||
- Backend: **FastAPI + MongoDB Atlas + Redis + Celery**
|
||||
- Frontend: **React 18 + Vite (TypeScript)**
|
||||
- Auth admin: **Microsoft AAD (MSAL)**
|
||||
- Charts: **recharts**
|
||||
- Tables/pivot: **@tanstack/react-table**
|
||||
|
||||
## Related articles
|
||||
|
||||
- [[wiki/tech-patterns/cost-tracker-integration|cost-tracker-integration]] — how to connect a new project
|
||||
- [[wiki/tech-patterns/cost-tracker-pricing-sources|cost-tracker-pricing-sources]] — how pricing is maintained
|
||||
- [[wiki/tech-patterns/cost-tracker-providers|cost-tracker-providers]] — billing units per AI provider
|
||||
- [[wiki/concepts/preflight-record-pattern|preflight-record-pattern]] — the core usage-tracking pattern
|
||||
- [[wiki/concepts/lazy-user-mirror|lazy-user-mirror]] — how user sync works
|
||||
- [[wiki/concepts/sync-with-outbox|sync-with-outbox]] — resilient HTTP calls with SQLite fallback
|
||||
- [[wiki/projects-overview/ai-cost-tracker|ai-cost-tracker (project card)]] — project registry card
|
||||
|
|
@ -50,5 +50,10 @@
|
|||
| [[wiki/concepts/fish-fisher-conf-d-conflict]] | Fisher plugin manager conflict when conf.d/ files are manually copied — delete and run fisher update to reinstall cleanly | daily/2026-04-22.md | 2026-04-22 |
|
||||
| [[wiki/concepts/macos-python-version-hooks]] | macOS system /usr/bin/python3 = 3.9; Path\|None syntax requires 3.10+; Claude Code hooks must use /opt/homebrew/bin/python3 | daily/2026-04-22.md | 2026-04-22 |
|
||||
|
||||
| [[wiki/concepts/preflight-record-pattern]] | preflight(estimate) → AI call → record(actual) — the 3-step AI cost-tracking pattern with budget enforcement | ai-cost-tracker | 2026-04-27 |
|
||||
| [[wiki/concepts/lazy-user-mirror]] | User mirror created on first AI event, not on user creation — minimal integration surface, source project stays owner | ai-cost-tracker | 2026-04-27 |
|
||||
| [[wiki/concepts/sync-with-outbox]] | Sync HTTP + SQLite outbox: record() never blocks the AI pipeline; background flusher retries up to 10x | ai-cost-tracker | 2026-04-27 |
|
||||
| [[wiki/concepts/litellm-pricing-source]] | LiteLLM model_prices JSON as auto-updating LLM price source — why scraping provider sites is fragile | ai-cost-tracker | 2026-04-27 |
|
||||
|
||||
<!-- Articles added automatically by compile.py -->
|
||||
<!-- Format: | [[concepts/slug]] | One-line summary | daily/YYYY-MM-DD.md | date | -->
|
||||
|
|
|
|||
58
wiki/concepts/lazy-user-mirror.md
Normal file
58
wiki/concepts/lazy-user-mirror.md
Normal file
|
|
@ -0,0 +1,58 @@
|
|||
---
|
||||
title: Lazy User Mirror
|
||||
tags: [concept, architecture, cost-tracking, users]
|
||||
created: 2026-04-27
|
||||
updated: 2026-04-27
|
||||
---
|
||||
|
||||
# Lazy User Mirror
|
||||
|
||||
The cost-tracker maintains a read-only mirror of users from each connected project. "Lazy" means the mirror record is created **on first use**, not when the user is created in the source project.
|
||||
|
||||
## Why lazy (not eager)?
|
||||
|
||||
- Source projects own their users; we don't want to add mandatory webhooks to every user CRUD operation
|
||||
- A user who never triggers an AI call doesn't need to exist in the mirror
|
||||
- Keeps integration surface minimal — just send `user_external_id` with every event
|
||||
|
||||
## How it works
|
||||
|
||||
1. When `POST /v1/preflight` or `POST /v1/usage/record` arrives with a `user_external_id` not yet in `users_mirror` → cost-tracker **auto-creates** a mirror record using `email`, `full_name`, and `role` from the request payload.
|
||||
|
||||
2. The SDK sends these on every call:
|
||||
```python
|
||||
await ct.preflight(
|
||||
user_external_id=user.id,
|
||||
user_email=user.email, # for mirror creation
|
||||
user_full_name=user.full_name, # for analytics display
|
||||
user_role=user.role, # for RBAC in analytics
|
||||
...
|
||||
)
|
||||
```
|
||||
|
||||
3. `users_mirror` record fields: `source_app`, `external_id`, `email`, `full_name`, `role`, `workspace_id` (from project lookup), `team_id`, `first_seen_at`, `last_seen_at`.
|
||||
|
||||
4. Subsequent calls with the same `user_external_id` update `last_seen_at` only — no overwrite of profile fields. Profile changes are pushed via explicit `POST /v1/users/upsert`.
|
||||
|
||||
## Explicit sync for profile changes
|
||||
|
||||
If a user's email, role, or name changes in the source project:
|
||||
|
||||
```python
|
||||
await ct.upsert_user(
|
||||
external_id=user.id,
|
||||
email=user.email,
|
||||
full_name=user.full_name,
|
||||
role=user.role,
|
||||
project_external_id=...,
|
||||
)
|
||||
```
|
||||
|
||||
In video-accessibility this is called from:
|
||||
- `POST /admin/users` — on user creation (`routes_admin.py:101`)
|
||||
- Login flow — on every successful login (keeps mirror fresh)
|
||||
|
||||
## Analytics implications
|
||||
|
||||
- Users without any AI events never appear in the mirror → analytics shows only users who actually consumed AI
|
||||
- Orphan users (no `project_external_id` provided) are mirrored but show `workspace_id=null` until assigned via bulk action in admin UI
|
||||
59
wiki/concepts/litellm-pricing-source.md
Normal file
59
wiki/concepts/litellm-pricing-source.md
Normal file
|
|
@ -0,0 +1,59 @@
|
|||
---
|
||||
title: LiteLLM as Pricing Source
|
||||
tags: [concept, ai, cost-tracking, pricing, llm]
|
||||
created: 2026-04-27
|
||||
updated: 2026-04-27
|
||||
---
|
||||
|
||||
# LiteLLM as Pricing Source
|
||||
|
||||
The AI Cost Tracker uses the open-source **LiteLLM model prices JSON** as the primary source of truth for LLM pricing. This eliminates the need to scrape provider websites.
|
||||
|
||||
## What is LiteLLM?
|
||||
|
||||
[LiteLLM](https://github.com/BerriAI/litellm) is an open-source Python library (30k+ GitHub stars) for calling 100+ LLM providers with a unified interface. It maintains a community-curated `model_prices_and_context_window.json` covering Gemini, OpenAI, Anthropic, Cohere, Mistral, Together AI, and many others.
|
||||
|
||||
## Why not scrape provider websites directly?
|
||||
|
||||
| Problem | Impact |
|
||||
|---|---|
|
||||
| Pricing pages are React SPAs | Need headless browser; brittle |
|
||||
| Layout changes without notice | Breaks silently; wrong costs logged |
|
||||
| Different billing units per provider | Complex parsing; easy to get wrong |
|
||||
| Tier/volume discounts in HTML | Nearly impossible to parse reliably |
|
||||
| ToS may prohibit scraping | Legal risk |
|
||||
|
||||
LiteLLM maintains all of this in a single structured JSON — battle-tested by thousands of production deployments.
|
||||
|
||||
## The JSON structure
|
||||
|
||||
```json
|
||||
{
|
||||
"gemini/gemini-3-pro-preview": {
|
||||
"input_cost_per_token": 0.00000125,
|
||||
"output_cost_per_token": 0.000005,
|
||||
"litellm_provider": "google",
|
||||
"mode": "chat",
|
||||
"max_tokens": 65536
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## What LiteLLM does NOT cover
|
||||
|
||||
- **ElevenLabs** — not an LLM; character-based billing
|
||||
- **Google Cloud TTS** — not an LLM; character-based billing
|
||||
- **Self-hosted models** — no external billing
|
||||
|
||||
These are defined in `pricing/models.yaml` in the cost-tracker repo. See [[wiki/tech-patterns/cost-tracker-pricing-sources|cost-tracker-pricing-sources]].
|
||||
|
||||
## Keeping prices accurate
|
||||
|
||||
1. Daily Celery beat task (`tasks/pricing_sync.py`) fetches the latest JSON
|
||||
2. If a price changes → admin gets notified; new price record created with `effective_from=today`
|
||||
3. Old price records kept forever for historical reporting
|
||||
4. To freeze at a known-good version: set `LITELLM_COMMIT_HASH` env var
|
||||
|
||||
## The alternative considered
|
||||
|
||||
Direct website scraping was evaluated and rejected due to the problems listed above. LiteLLM is the standard community solution for this exact use case.
|
||||
59
wiki/concepts/preflight-record-pattern.md
Normal file
59
wiki/concepts/preflight-record-pattern.md
Normal file
|
|
@ -0,0 +1,59 @@
|
|||
---
|
||||
title: Preflight + Record Pattern
|
||||
tags: [concept, ai, cost-tracking, patterns]
|
||||
created: 2026-04-27
|
||||
updated: 2026-04-27
|
||||
---
|
||||
|
||||
# Preflight + Record Pattern
|
||||
|
||||
The core usage-tracking pattern used by the AI Cost Tracker SDK. Every paid AI call follows the same three steps.
|
||||
|
||||
## The pattern
|
||||
|
||||
```
|
||||
preflight(estimated_units) → call AI → record(actual_units)
|
||||
```
|
||||
|
||||
1. **Preflight** — before the AI call, ask the cost-tracker: "Is this workspace/project within budget?"
|
||||
- Input: model name + estimated units (tokens, chars, etc.)
|
||||
- Output: `allow=true/false`, estimated cost, `request_id`
|
||||
- If `allow=false` → raise `BudgetExceeded` before calling the AI API
|
||||
|
||||
2. **AI call** — the actual paid API call (unmodified)
|
||||
|
||||
3. **Record** — after the call, report actual usage
|
||||
- Input: `request_id` from preflight + actual units from response
|
||||
- Output: `event_id`, `cost_usd`
|
||||
- If cost-tracker is unavailable → SDK saves to SQLite outbox and retries in background (see [[wiki/concepts/sync-with-outbox|sync-with-outbox]])
|
||||
|
||||
## Why two steps?
|
||||
|
||||
- **Preflight** enables hard budget enforcement **before** money is spent
|
||||
- **Record** captures accurate actual usage (estimated ≠ actual for output tokens)
|
||||
- Decoupling protects the AI pipeline: if cost-tracker goes down after preflight, `record()` still succeeds via outbox
|
||||
|
||||
## Estimation accuracy
|
||||
|
||||
Preflight uses estimated units because output token count is unknown before the call:
|
||||
|
||||
| Provider | What we estimate | Accuracy |
|
||||
|---|---|---|
|
||||
| Gemini text | input tokens (`len/4`), output tokens (caller hint) | ±30% |
|
||||
| Gemini video | input tokens (file-size table), output tokens (hint) | ±50% |
|
||||
| ElevenLabs | chars (exact — `len(text)`) | 100% |
|
||||
| Google TTS | chars (exact — `len(text)`) | 100% |
|
||||
|
||||
Over-estimation is better than under-estimation for budget enforcement. If you consistently over-estimate by 50%, tune the default `estimated_output_tokens` hint downward.
|
||||
|
||||
## Hard limit mechanics
|
||||
|
||||
- Preflight computes `current_month_spend + estimated_cost`
|
||||
- If this exceeds `budget.amount_usd` AND `budget.hard_limit=True` → `allow=false`
|
||||
- The budget check is **eventual** (reads from pre-aggregated rollups + today's raw events), not transactional — brief overage is possible under high concurrency
|
||||
- This is acceptable for AI cost tracking: exact-to-the-cent enforcement would require distributed locks and add unacceptable latency
|
||||
|
||||
## Projects using this pattern
|
||||
|
||||
- [[wiki/projects-overview/ai-cost-tracker|ai-cost-tracker]] (defines the pattern)
|
||||
- video-accessibility (first consumer, Phase 1)
|
||||
67
wiki/concepts/sync-with-outbox.md
Normal file
67
wiki/concepts/sync-with-outbox.md
Normal file
|
|
@ -0,0 +1,67 @@
|
|||
---
|
||||
title: Sync HTTP + SQLite Outbox Pattern
|
||||
tags: [concept, architecture, resilience, cost-tracking]
|
||||
created: 2026-04-27
|
||||
updated: 2026-04-27
|
||||
---
|
||||
|
||||
# Sync HTTP + SQLite Outbox Pattern
|
||||
|
||||
The pattern used by the cost-tracker SDK to ensure usage events are **never lost** even if the cost-tracker service is temporarily unavailable.
|
||||
|
||||
## The problem
|
||||
|
||||
The AI pipeline (Celery worker) calls `ct.record(...)` after each AI API call. If the cost-tracker service is down, a naive implementation would either:
|
||||
- Silently drop the event (cost data lost)
|
||||
- Raise an exception (AI pipeline fails — unacceptable)
|
||||
|
||||
## The solution
|
||||
|
||||
```
|
||||
record() → try POST /v1/usage/record
|
||||
├── success → done
|
||||
└── failure (timeout / 5xx / network) → save to SQLite outbox
|
||||
↓
|
||||
background flusher (every 30s)
|
||||
retries all pending events with
|
||||
exponential backoff
|
||||
```
|
||||
|
||||
## Implementation details
|
||||
|
||||
**SQLite outbox** (one file per worker, default `/tmp/cost_outbox.sqlite`):
|
||||
- Schema: `(id, ts, payload_json, attempts, last_attempt_at, status)`
|
||||
- Written synchronously before returning from `record()` on failure
|
||||
- Never blocks the AI pipeline
|
||||
|
||||
**Background flusher** (asyncio background task):
|
||||
- Starts when `CostTracker` is initialised
|
||||
- Every 30 seconds: reads all `status='pending'` rows, retries `POST /v1/usage/record`
|
||||
- On success: marks `status='sent'`
|
||||
- After 10 failed attempts: marks `status='dead'`, logs warning → human investigation needed
|
||||
|
||||
**Graceful degradation:**
|
||||
- `record()` never raises `CostTrackerUnavailable` — it's fire-and-forget via outbox
|
||||
- `preflight()` returns `allow=true` on connectivity failure by default (`fail_open=True`). Configurable.
|
||||
|
||||
## Configuration
|
||||
|
||||
```python
|
||||
ct = CostTracker(
|
||||
...
|
||||
outbox_path="/tmp/cost_outbox.sqlite",
|
||||
flush_interval_seconds=30,
|
||||
max_retry_attempts=10,
|
||||
fail_open=True, # preflight returns allow=True when service unreachable
|
||||
)
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
|
||||
- Outbox depth reported in SDK's `/metrics` endpoint (if enabled)
|
||||
- `dead` status rows require manual review — add to monitoring alert
|
||||
|
||||
## Where this pattern applies
|
||||
|
||||
- Any Oliver project using the `oliver-cost-tracker` SDK
|
||||
- Generally applicable to any fire-and-forget side-effect call where data loss is unacceptable but the consumer must not block the main flow
|
||||
50
wiki/projects-overview/ai-cost-tracker.md
Normal file
50
wiki/projects-overview/ai-cost-tracker.md
Normal file
|
|
@ -0,0 +1,50 @@
|
|||
---
|
||||
title: AI Cost Tracker — Project Card
|
||||
tags: [projects, oliver-platform, ai, cost-tracking]
|
||||
created: 2026-04-27
|
||||
updated: 2026-04-27
|
||||
---
|
||||
|
||||
# AI Cost Tracker
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| **Repo** | `git@bitbucket.org:zlalani/ai-cost-tracker.git` |
|
||||
| **Stack** | FastAPI + MongoDB Atlas + Redis + Celery + React 18 + Vite |
|
||||
| **Auth (admin)** | Microsoft AAD (MSAL) |
|
||||
| **Auth (API)** | API key per connected project |
|
||||
| **Status** | Phase 1 — building (April 2026) |
|
||||
| **First consumer** | video-accessibility |
|
||||
|
||||
## What it does
|
||||
|
||||
Centralised AI cost tracking for all Oliver projects. Every project sends preflight + record events; cost-tracker aggregates, stores, and presents analytics by workspace / team / project / user / model.
|
||||
|
||||
See [[wiki/architecture/ai-cost-tracker|ai-cost-tracker architecture]] for full architecture details.
|
||||
|
||||
## Key URLs
|
||||
|
||||
- Admin UI: TBD (e.g. `https://cost.oliver.agency`)
|
||||
- API base: `https://cost.oliver.agency/v1`
|
||||
- Health: `https://cost.oliver.agency/v1/health`
|
||||
|
||||
## Connected projects
|
||||
|
||||
| Project | Source app name | Connected since |
|
||||
|---|---|---|
|
||||
| video-accessibility | `video-accessibility` | Phase 1 (April 2026) |
|
||||
|
||||
## How to connect a new project
|
||||
|
||||
See [[wiki/tech-patterns/cost-tracker-integration|cost-tracker-integration]] — full step-by-step guide.
|
||||
|
||||
## Related articles
|
||||
|
||||
- [[wiki/architecture/ai-cost-tracker|Architecture overview]]
|
||||
- [[wiki/tech-patterns/cost-tracker-integration|Integration guide (new projects)]]
|
||||
- [[wiki/tech-patterns/cost-tracker-pricing-sources|Pricing sources]]
|
||||
- [[wiki/tech-patterns/cost-tracker-providers|Provider billing units]]
|
||||
- [[wiki/concepts/preflight-record-pattern|Preflight + Record pattern]]
|
||||
- [[wiki/concepts/lazy-user-mirror|Lazy user mirror]]
|
||||
- [[wiki/concepts/sync-with-outbox|Sync HTTP + outbox pattern]]
|
||||
- [[wiki/concepts/litellm-pricing-source|LiteLLM as pricing source]]
|
||||
|
|
@ -24,6 +24,9 @@ Recurring technology stacks used across Oliver Agency projects. Each article cov
|
|||
| [[wiki/tech-patterns/one2edit-api\|one2edit-api]] | One2Edit translation platform API | 3M Portal, H&M O2E Tool |
|
||||
| [[wiki/tech-patterns/nodejs-vanilla-proxy\|nodejs-vanilla-proxy]] | Node.js + Vanilla JS lightweight proxy tools | 3M Portal, Ferrero, Homepage |
|
||||
| [[wiki/tech-patterns/kling-veo-video-api\|kling-veo-video-api]] | Kling AI + Google Veo 3.1 video generation — camera control, I2V, polling | Cinema Studio Pro |
|
||||
| [[wiki/tech-patterns/cost-tracker-integration\|cost-tracker-integration]] | Step-by-step guide: connect any Oliver project to ai-cost-tracker (API key, SDK install, wrap AI calls, budgets) | All Oliver projects |
|
||||
| [[wiki/tech-patterns/cost-tracker-pricing-sources\|cost-tracker-pricing-sources]] | Three-layer pricing pipeline: LiteLLM auto-sync > YAML (non-LLM) > admin override; historical effective_from/to | ai-cost-tracker |
|
||||
| [[wiki/tech-patterns/cost-tracker-providers\|cost-tracker-providers]] | Billing units per AI provider: Gemini tokens (usage_metadata), ElevenLabs chars, Google TTS chars | All AI projects |
|
||||
|
||||
## Quick Decision Guide
|
||||
|
||||
|
|
|
|||
232
wiki/tech-patterns/cost-tracker-integration.md
Normal file
232
wiki/tech-patterns/cost-tracker-integration.md
Normal file
|
|
@ -0,0 +1,232 @@
|
|||
---
|
||||
title: AI Cost Tracker — Integrating a New Project
|
||||
tags: [how-to, ai, cost-tracking, integration]
|
||||
created: 2026-04-27
|
||||
updated: 2026-04-27
|
||||
---
|
||||
|
||||
# Integrating a New Project with AI Cost Tracker
|
||||
|
||||
Step-by-step guide for connecting any Oliver backend project to the shared cost-tracker.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- You have a Bitbucket account with access to `zlalani/ai-cost-tracker`
|
||||
- You can reach the cost-tracker admin UI (ask for the domain)
|
||||
- Your project is a FastAPI + Python backend (adaptable for other stacks)
|
||||
|
||||
---
|
||||
|
||||
## Step 1 — Get an API key
|
||||
|
||||
1. Open the cost-tracker Admin UI → **API Keys** → **Create key**
|
||||
2. Name it after your project (e.g. `video-accessibility-prod`)
|
||||
3. Scope: `preflight`, `record`, `upsert`
|
||||
4. Copy the key — it is shown **only once**
|
||||
5. Store in your project's GCP Secret Manager (same pattern as `GEMINI_API_KEY`)
|
||||
|
||||
---
|
||||
|
||||
## Step 2 — Install the SDK
|
||||
|
||||
```bash
|
||||
pip install oliver-cost-tracker
|
||||
# or, while the private package isn't published yet:
|
||||
# git submodule add git@bitbucket.org:zlalani/ai-cost-tracker.git vendor/cost-tracker
|
||||
# pip install -e vendor/cost-tracker/sdk/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 3 — Add environment variables
|
||||
|
||||
```env
|
||||
COST_TRACKER_BASE_URL=https://cost.oliver.agency
|
||||
COST_TRACKER_API_KEY=ct_live_xxxxxxxxxxxxxxxxxxxx
|
||||
COST_TRACKER_SOURCE_APP=video-accessibility
|
||||
COST_TRACKER_OUTBOX_PATH=/tmp/cost_outbox.sqlite
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 4 — Initialise the client
|
||||
|
||||
In `core/dependencies.py` (FastAPI):
|
||||
|
||||
```python
|
||||
from oliver_cost_tracker import CostTracker
|
||||
from functools import lru_cache
|
||||
|
||||
@lru_cache
|
||||
def get_cost_tracker() -> CostTracker:
|
||||
return CostTracker(
|
||||
base_url=settings.cost_tracker_base_url,
|
||||
api_key=settings.cost_tracker_api_key,
|
||||
source_app=settings.cost_tracker_source_app,
|
||||
outbox_path=settings.cost_tracker_outbox_path,
|
||||
)
|
||||
```
|
||||
|
||||
In Celery workers, instantiate once at module level:
|
||||
|
||||
```python
|
||||
cost_tracker = CostTracker(
|
||||
base_url=settings.cost_tracker_base_url,
|
||||
api_key=settings.cost_tracker_api_key,
|
||||
source_app="video-accessibility",
|
||||
outbox_path="/tmp/cost_outbox.sqlite",
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 5 — Wrap AI calls (preflight → call → record)
|
||||
|
||||
This is the **core pattern**. Every paid AI call follows three steps.
|
||||
See [[wiki/concepts/preflight-record-pattern|preflight-record-pattern]] for the full explanation.
|
||||
|
||||
```python
|
||||
from oliver_cost_tracker import CostTracker, BudgetExceeded
|
||||
import time
|
||||
|
||||
async def call_gemini_with_tracking(
|
||||
ct: CostTracker,
|
||||
prompt: str,
|
||||
user_id: str,
|
||||
job_id: str,
|
||||
project_id: str | None = None,
|
||||
) -> GenerateContentResponse:
|
||||
|
||||
# 1. Preflight — checks budget, returns allow/deny
|
||||
preflight = await ct.preflight(
|
||||
user_external_id=user_id,
|
||||
project_external_id=project_id,
|
||||
job_external_id=job_id,
|
||||
model="gemini-3-pro-preview",
|
||||
estimated_input_tokens=ct.estimate_tokens(prompt),
|
||||
estimated_output_tokens=2048, # conservative overestimate
|
||||
)
|
||||
if not preflight.allow:
|
||||
raise BudgetExceeded(preflight.deny_reason)
|
||||
|
||||
# 2. AI call
|
||||
t0 = time.monotonic()
|
||||
response = await client.models.generate_content(
|
||||
model="gemini-3-pro-preview",
|
||||
contents=prompt,
|
||||
)
|
||||
elapsed_ms = int((time.monotonic() - t0) * 1000)
|
||||
|
||||
# 3. Record actual usage
|
||||
await ct.record(
|
||||
request_id=preflight.request_id,
|
||||
user_external_id=user_id,
|
||||
project_external_id=project_id,
|
||||
job_external_id=job_id,
|
||||
model="gemini-3-pro-preview",
|
||||
input_tokens=response.usage_metadata.prompt_token_count,
|
||||
output_tokens=response.usage_metadata.candidates_token_count,
|
||||
latency_ms=elapsed_ms,
|
||||
status="success",
|
||||
)
|
||||
return response
|
||||
```
|
||||
|
||||
For providers without usage metadata (ElevenLabs, Google Cloud TTS), compute units from input:
|
||||
|
||||
```python
|
||||
await ct.record(
|
||||
...
|
||||
model="eleven_multilingual_v2",
|
||||
chars=len(text), # billing unit is characters
|
||||
latency_ms=elapsed_ms,
|
||||
status="success",
|
||||
)
|
||||
```
|
||||
|
||||
See [[wiki/tech-patterns/cost-tracker-providers|cost-tracker-providers]] for all provider details.
|
||||
|
||||
---
|
||||
|
||||
## Step 6 — Attribution in background tasks (Celery)
|
||||
|
||||
Celery tasks typically receive only `job_id`. Fetch `user_id` and `project_id` from the job document:
|
||||
|
||||
```python
|
||||
@celery_app.task
|
||||
async def my_ai_task(job_id: str):
|
||||
job = await db.jobs.find_one({"_id": job_id}, {"client_id": 1, "project_id": 1})
|
||||
user_id = job["client_id"]
|
||||
project_id = job.get("project_id") # may be None for old jobs
|
||||
|
||||
result = await call_gemini_with_tracking(
|
||||
ct=cost_tracker,
|
||||
prompt=build_prompt(job),
|
||||
user_id=user_id,
|
||||
job_id=job_id,
|
||||
project_id=project_id,
|
||||
)
|
||||
```
|
||||
|
||||
For per-cue Celery tasks (like `tts_synthesis.py`), add `user_id` and `project_id` to the task kwargs to avoid an extra DB fetch per cue.
|
||||
|
||||
---
|
||||
|
||||
## Step 7 — Create Workspace / Team / Project in Admin UI
|
||||
|
||||
Before going live, set up the org structure in the cost-tracker Admin UI:
|
||||
|
||||
1. **Workspace** → Create (e.g. "Ford", "H&M", or "video-accessibility" for internal use)
|
||||
2. **Team** → Create under the workspace (e.g. "Video Production")
|
||||
3. **Project** → Create under the team, set `source_app` = your project name and `external_id` = the ID your project will send in `project_external_id`
|
||||
|
||||
> Jobs sent before a project is created appear as **Unassigned** in the dashboard. You can reassign them bulk later.
|
||||
|
||||
---
|
||||
|
||||
## Step 8 — Set budgets and alerts
|
||||
|
||||
In Admin UI → **Budgets** → Create:
|
||||
|
||||
- `scope_type`: workspace / team / project
|
||||
- `amount_usd`: monthly limit
|
||||
- `alert_thresholds`: [0.5, 0.8, 1.0] (email at 50%, 80%, 100%)
|
||||
- `hard_limit`: true = preflight returns `allow=false` when exceeded
|
||||
|
||||
---
|
||||
|
||||
## Step 9 — Smoke test
|
||||
|
||||
```bash
|
||||
# 1. Check service is up
|
||||
curl https://cost.oliver.agency/v1/health
|
||||
|
||||
# 2. Preflight
|
||||
curl -X POST https://cost.oliver.agency/v1/preflight \
|
||||
-H "X-API-Key: ct_live_xxx" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"user_external_id":"test-user","model":"gemini-3-pro-preview","estimated_units":{"input_tokens":1000,"output_tokens":200}}'
|
||||
# expect: {"allow":true,"estimated_cost_usd":...,"request_id":"..."}
|
||||
|
||||
# 3. Record
|
||||
curl -X POST https://cost.oliver.agency/v1/usage/record \
|
||||
-H "X-API-Key: ct_live_xxx" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"request_id":"<from above>","user_external_id":"test-user","model":"gemini-3-pro-preview","units":{"input_tokens":987,"output_tokens":180},"latency_ms":1200,"status":"success"}'
|
||||
# expect: {"event_id":"...","cost_usd":0.00214}
|
||||
```
|
||||
|
||||
Then open the Admin UI Dashboard — the test event should appear within seconds.
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Symptom | Cause | Fix |
|
||||
|---|---|---|
|
||||
| `preflight` returns 401 | Wrong or missing API key | Check `COST_TRACKER_API_KEY` env var; verify key is active in Admin UI |
|
||||
| `preflight` returns `allow=false` | Budget exceeded | Admin UI → Budgets → raise limit or wait for next billing period |
|
||||
| Events not appearing in dashboard | Outbox accumulating (service down) | Check `/tmp/cost_outbox.sqlite`; service auto-retries when back up |
|
||||
| `cost_usd=null` on events | Model not in pricing table | Admin UI → Pricing → add model, or check LiteLLM sync task ran |
|
||||
| Slow preflight (>500ms) | cost-tracker under load or network | SDK retries automatically; if persistent, check service metrics |
|
||||
| Token estimates wildly off | Char/4 heuristic for video prompts | Gemini video needs file_size-based lookup; see [[wiki/tech-patterns/cost-tracker-providers|cost-tracker-providers]] |
|
||||
131
wiki/tech-patterns/cost-tracker-pricing-sources.md
Normal file
131
wiki/tech-patterns/cost-tracker-pricing-sources.md
Normal file
|
|
@ -0,0 +1,131 @@
|
|||
---
|
||||
title: AI Cost Tracker — Pricing Sources
|
||||
tags: [how-to, ai, cost-tracking, pricing]
|
||||
created: 2026-04-27
|
||||
updated: 2026-04-27
|
||||
---
|
||||
|
||||
# AI Cost Tracker — Pricing Sources
|
||||
|
||||
The cost-tracker uses a **three-layer hybrid pricing pipeline**. Understanding the priority order is essential for accurate billing attribution.
|
||||
|
||||
## Priority order
|
||||
|
||||
```
|
||||
override (highest — set manually by admin)
|
||||
yaml (fallback — versioned in repo for non-LLM providers)
|
||||
litellm (lowest — auto-synced daily from open source)
|
||||
```
|
||||
|
||||
`compute_cost(provider, model, units, ts)` returns the cost using the **highest-priority active price** for the given model at timestamp `ts`.
|
||||
|
||||
---
|
||||
|
||||
## Layer 1 — LiteLLM auto-sync (LLM providers)
|
||||
|
||||
**Source:** `https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json`
|
||||
|
||||
**Coverage:** Gemini, OpenAI, Anthropic, Cohere, Mistral, Together, and 100+ others.
|
||||
|
||||
**Sync schedule:** Celery beat task `tasks/pricing_sync.py` runs **daily at 02:00 UTC**.
|
||||
|
||||
**What happens on sync:**
|
||||
1. Fetches the JSON (pinned to a configurable commit hash in `LITELLM_COMMIT_HASH` env var)
|
||||
2. Maps `input_cost_per_token` / `output_cost_per_token` to our schema
|
||||
3. For each model:
|
||||
- If no existing price → creates new `model_prices` record with `source="litellm"`
|
||||
- If price unchanged → updates `litellm_commit_hash`, no other change
|
||||
- If price **changed** → closes old record (`effective_to=today`), creates new record, sends **admin notification email**
|
||||
|
||||
> **Note:** Auto-price changes never silently modify `source="override"` records. If you have an override active, the sync logs a divergence warning but leaves your override intact.
|
||||
|
||||
**To pin a specific version** (for reproducibility):
|
||||
```env
|
||||
LITELLM_COMMIT_HASH=abc123def456 # pin to a known-good commit
|
||||
```
|
||||
|
||||
See [[wiki/concepts/litellm-pricing-source|litellm-pricing-source]] for deeper explanation.
|
||||
|
||||
---
|
||||
|
||||
## Layer 2 — YAML (non-LLM providers)
|
||||
|
||||
**File:** `backend/app/pricing/models.yaml` — versioned in the cost-tracker repo.
|
||||
|
||||
Contains providers that LiteLLM does not cover:
|
||||
|
||||
```yaml
|
||||
# ElevenLabs
|
||||
- provider: elevenlabs
|
||||
model: eleven_multilingual_v2
|
||||
billing_unit: char
|
||||
price_per_1k_usd: 0.30
|
||||
effective_from: "2025-01-01"
|
||||
|
||||
- provider: elevenlabs
|
||||
model: eleven_flash_v2_5
|
||||
billing_unit: char
|
||||
price_per_1k_usd: 0.11
|
||||
effective_from: "2025-01-01"
|
||||
|
||||
# Google Cloud TTS
|
||||
- provider: google_tts
|
||||
model: standard
|
||||
billing_unit: char
|
||||
price_per_1m_usd: 4.00
|
||||
effective_from: "2024-01-01"
|
||||
|
||||
- provider: google_tts
|
||||
model: wavenet
|
||||
billing_unit: char
|
||||
price_per_1m_usd: 16.00
|
||||
effective_from: "2024-01-01"
|
||||
```
|
||||
|
||||
**When to update YAML:**
|
||||
- ElevenLabs raises/lowers per-char pricing
|
||||
- Google Cloud TTS changes tier pricing
|
||||
- Adding a brand-new non-LLM provider
|
||||
|
||||
**How to update:**
|
||||
1. Add a new entry with the new price and `effective_from: "YYYY-MM-DD"`
|
||||
2. Leave the old entry — it is used for historical cost attribution
|
||||
3. Deploy the new YAML → loader upserts on startup
|
||||
|
||||
**Do NOT delete old entries** — they are needed for retroactive reports.
|
||||
|
||||
---
|
||||
|
||||
## Layer 3 — Admin override (UI)
|
||||
|
||||
Use when you have:
|
||||
- A negotiated enterprise contract price (different from public pricing)
|
||||
- A volume discount or committed-use agreement
|
||||
- A temporary promotional rate
|
||||
- A price correction before the next LiteLLM sync
|
||||
|
||||
**How to create an override:**
|
||||
1. Admin UI → **Pricing** → find the model → **Override price**
|
||||
2. Set: `price_per_unit_usd`, `effective_from` (defaults to today), optional `override_reason`
|
||||
3. Save → old price gets `effective_to=effective_from`, override is now active
|
||||
|
||||
Override records are never auto-modified by LiteLLM sync.
|
||||
|
||||
---
|
||||
|
||||
## Historical pricing and retroactive reports
|
||||
|
||||
Every usage event is stored with `price_id` — a reference to the exact `model_prices` record active at the time of the call:
|
||||
|
||||
- **Retroactive reports are always accurate** — changing a price today does not affect yesterday's costs
|
||||
- Old `model_prices` records with `effective_to` set are never deleted
|
||||
- Re-evaluating historical costs with new pricing = manual export + spreadsheet (not a built-in feature)
|
||||
|
||||
---
|
||||
|
||||
## Monthly reconciliation
|
||||
|
||||
Recommended monthly check:
|
||||
1. Download invoice from Google Cloud Console / ElevenLabs dashboard
|
||||
2. Compare with cost-tracker "Actual vs Billed" report (Admin UI → Analytics → Reconciliation)
|
||||
3. If >5% discrepancy: check for `pricing_missing=true` events and add missing model prices
|
||||
141
wiki/tech-patterns/cost-tracker-providers.md
Normal file
141
wiki/tech-patterns/cost-tracker-providers.md
Normal file
|
|
@ -0,0 +1,141 @@
|
|||
---
|
||||
title: AI Cost Tracker — Billing Units per Provider
|
||||
tags: [reference, ai, cost-tracking, providers]
|
||||
created: 2026-04-27
|
||||
updated: 2026-04-27
|
||||
---
|
||||
|
||||
# Billing Units per Provider
|
||||
|
||||
Reference for how each AI provider bills and how to extract usage data from their API responses.
|
||||
|
||||
## Gemini (Google AI / Vertex AI)
|
||||
|
||||
**Billing unit:** tokens (input + output separately)
|
||||
|
||||
**SDK:** `google-genai` Python SDK
|
||||
|
||||
**How to get usage:**
|
||||
```python
|
||||
response = await client.models.generate_content(...)
|
||||
|
||||
input_tokens = response.usage_metadata.prompt_token_count
|
||||
output_tokens = response.usage_metadata.candidates_token_count
|
||||
total_tokens = response.usage_metadata.total_token_count
|
||||
```
|
||||
|
||||
> ⚠️ `usage_metadata` is available on all `generate_content` responses including multimodal (video + text prompts). It was **not being read** in video-accessibility before the cost-tracker integration — added as part of Phase 1.
|
||||
|
||||
**Token estimation before the call:**
|
||||
- Text: `len(text) / 4` (rough heuristic; actual tokenisation varies ±30%)
|
||||
- Video file: use Google's published token table:
|
||||
- < 1 min video ≈ 1,000–2,000 tokens + audio
|
||||
- Exact: check `google.genai` file metadata after upload
|
||||
- Image: ~258 tokens per 512×512 tile
|
||||
|
||||
**Pricing:** auto-synced from LiteLLM. See [[wiki/tech-patterns/cost-tracker-pricing-sources|cost-tracker-pricing-sources]].
|
||||
|
||||
---
|
||||
|
||||
## Gemini TTS (audio generation via generate_content)
|
||||
|
||||
**Billing unit:** tokens (output audio tokens, different rate from text)
|
||||
|
||||
**SDK:** same `google-genai`, with `response_modalities=["AUDIO"]`
|
||||
|
||||
**How to get usage:**
|
||||
```python
|
||||
response = await client.models.generate_content(
|
||||
model="gemini-2.5-flash-preview-tts",
|
||||
contents=...,
|
||||
config=GenerateContentConfig(response_modalities=["AUDIO"]),
|
||||
)
|
||||
output_tokens = response.usage_metadata.candidates_token_count
|
||||
```
|
||||
|
||||
Audio output token rate differs from text output rate — verify in LiteLLM for model `gemini-2.5-flash-preview-tts`.
|
||||
|
||||
---
|
||||
|
||||
## ElevenLabs TTS
|
||||
|
||||
**Billing unit:** characters (input text length)
|
||||
|
||||
**SDK:** custom HTTP (`aiohttp` POST to `https://api.elevenlabs.io/v1/text-to-speech/{voice_id}`)
|
||||
|
||||
**Response:** returns raw audio bytes. **No usage metadata in response.**
|
||||
|
||||
**How to measure:** compute `len(text)` at the call site **before** making the request:
|
||||
|
||||
```python
|
||||
char_count = len(text)
|
||||
# make the ElevenLabs call
|
||||
await ct.record(..., chars=char_count, model="eleven_multilingual_v2", ...)
|
||||
```
|
||||
|
||||
**Subscription vs pay-as-you-go:** ElevenLabs bills against a monthly character quota. When quota is exceeded, pay-as-you-go rate applies. The cost-tracker assumes pay-as-you-go for all characters (conservative upper bound). Adjust via admin override if on a subscription plan.
|
||||
|
||||
---
|
||||
|
||||
## Google Cloud TTS
|
||||
|
||||
**Billing unit:** characters (input text length, after SSML stripping)
|
||||
|
||||
**SDK:** `google.cloud.texttospeech` Python SDK
|
||||
|
||||
**Response:** `SynthesizeSpeechResponse` with `audio_content` (bytes). **No character count in response.**
|
||||
|
||||
**How to measure:**
|
||||
```python
|
||||
char_count = len(synthesis_input.text)
|
||||
# for SSML Google bills stripped char count — approximate with len(ssml)
|
||||
await ct.record(..., chars=char_count, model="standard", ...)
|
||||
```
|
||||
|
||||
**Voice tiers and pricing:**
|
||||
|
||||
| Voice type | Billing model name | Price per 1M chars |
|
||||
|---|---|---|
|
||||
| Standard | `google_tts/standard` | $4.00 |
|
||||
| WaveNet | `google_tts/wavenet` | $16.00 |
|
||||
| Neural2 | `google_tts/neural2` | $16.00 |
|
||||
| Studio | `google_tts/studio` | $160.00 |
|
||||
|
||||
Defined in `pricing/models.yaml` in the cost-tracker repo.
|
||||
|
||||
---
|
||||
|
||||
## OpenAI (future)
|
||||
|
||||
**Billing unit:** tokens (input + output)
|
||||
|
||||
```python
|
||||
response = client.chat.completions.create(...)
|
||||
input_tokens = response.usage.prompt_tokens
|
||||
output_tokens = response.usage.completion_tokens
|
||||
```
|
||||
|
||||
Auto-synced by LiteLLM.
|
||||
|
||||
---
|
||||
|
||||
## Anthropic Claude (future)
|
||||
|
||||
**Billing unit:** tokens (input + output)
|
||||
|
||||
```python
|
||||
response = client.messages.create(...)
|
||||
input_tokens = response.usage.input_tokens
|
||||
output_tokens = response.usage.output_tokens
|
||||
```
|
||||
|
||||
Auto-synced by LiteLLM.
|
||||
|
||||
---
|
||||
|
||||
## Whisper (self-hosted)
|
||||
|
||||
**Not billed per token.** Runs on Cloud Run / GPU compute.
|
||||
|
||||
Billing = infrastructure cost (compute time). Phase 1 does not track this.
|
||||
Future Phase 2: track `audio_duration_seconds` and approximate cost from Cloud Run billing data.
|
||||
Loading…
Add table
Reference in a new issue