Preflight + Record Pattern

The core usage-tracking pattern used by the AI Cost Tracker SDK. Every paid AI call follows the same three steps.

The pattern

preflight(estimated_units) → call AI → record(actual_units)

Preflight — before the AI call, ask the cost-tracker: "Is this workspace/project within budget?"
- Input: model name + estimated units (tokens, chars, etc.)
- Output: allow=true/false, estimated cost, request_id
- If allow=false → raise BudgetExceeded before calling the AI API
AI call — the actual paid API call (unmodified)
Record — after the call, report actual usage
- Input: request_id from preflight + actual units from response
- Output: event_id, cost_usd
- If cost-tracker is unavailable → SDK saves to SQLite outbox and retries in background (see wiki/concepts/sync-with-outbox)

Why two steps?

Preflight enables hard budget enforcement before money is spent
Record captures accurate actual usage (estimated ≠ actual for output tokens)
Decoupling protects the AI pipeline: if cost-tracker goes down after preflight, record() still succeeds via outbox

Estimation accuracy

Preflight uses estimated units because output token count is unknown before the call:

Provider	What we estimate	Accuracy
Gemini text	input tokens (`len/4`), output tokens (caller hint)	±30%
Gemini video	input tokens (file-size table), output tokens (hint)	±50%
ElevenLabs	chars (exact — `len(text)`)	100%
Google TTS	chars (exact — `len(text)`)	100%

Over-estimation is better than under-estimation for budget enforcement. If you consistently over-estimate by 50%, tune the default estimated_output_tokens hint downward.

Hard limit mechanics

Preflight computes current_month_spend + estimated_cost
If this exceeds budget.amount_usd AND budget.hard_limit=True → allow=false
The budget check is eventual (reads from pre-aggregated rollups + today's raw events), not transactional — brief overage is possible under high concurrency
This is acceptable for AI cost tracking: exact-to-the-cent enforcement would require distributed locks and add unacceptable latency

Projects using this pattern

wiki/projects-overview/ai-cost-tracker (defines the pattern)
video-accessibility (first consumer, Phase 1)

2.5 KiB Raw Blame History