obsidian/wiki/concepts/preflight-record-pattern.md
2026-04-27 11:11:54 +01:00

2.5 KiB

title tags created updated
Preflight + Record Pattern
concept
ai
cost-tracking
patterns
2026-04-27 2026-04-27

Preflight + Record Pattern

The core usage-tracking pattern used by the AI Cost Tracker SDK. Every paid AI call follows the same three steps.

The pattern

preflight(estimated_units) → call AI → record(actual_units)
  1. Preflight — before the AI call, ask the cost-tracker: "Is this workspace/project within budget?"

    • Input: model name + estimated units (tokens, chars, etc.)
    • Output: allow=true/false, estimated cost, request_id
    • If allow=false → raise BudgetExceeded before calling the AI API
  2. AI call — the actual paid API call (unmodified)

  3. Record — after the call, report actual usage

    • Input: request_id from preflight + actual units from response
    • Output: event_id, cost_usd
    • If cost-tracker is unavailable → SDK saves to SQLite outbox and retries in background (see wiki/concepts/sync-with-outbox)

Why two steps?

  • Preflight enables hard budget enforcement before money is spent
  • Record captures accurate actual usage (estimated ≠ actual for output tokens)
  • Decoupling protects the AI pipeline: if cost-tracker goes down after preflight, record() still succeeds via outbox

Estimation accuracy

Preflight uses estimated units because output token count is unknown before the call:

Provider What we estimate Accuracy
Gemini text input tokens (len/4), output tokens (caller hint) ±30%
Gemini video input tokens (file-size table), output tokens (hint) ±50%
ElevenLabs chars (exact — len(text)) 100%
Google TTS chars (exact — len(text)) 100%

Over-estimation is better than under-estimation for budget enforcement. If you consistently over-estimate by 50%, tune the default estimated_output_tokens hint downward.

Hard limit mechanics

  • Preflight computes current_month_spend + estimated_cost
  • If this exceeds budget.amount_usd AND budget.hard_limit=Trueallow=false
  • The budget check is eventual (reads from pre-aggregated rollups + today's raw events), not transactional — brief overage is possible under high concurrency
  • This is acceptable for AI cost tracking: exact-to-the-cent enforcement would require distributed locks and add unacceptable latency

Projects using this pattern