2.5 KiB
2.5 KiB
| title | tags | created | updated | ||||
|---|---|---|---|---|---|---|---|
| Preflight + Record Pattern |
|
2026-04-27 | 2026-04-27 |
Preflight + Record Pattern
The core usage-tracking pattern used by the AI Cost Tracker SDK. Every paid AI call follows the same three steps.
The pattern
preflight(estimated_units) → call AI → record(actual_units)
-
Preflight — before the AI call, ask the cost-tracker: "Is this workspace/project within budget?"
- Input: model name + estimated units (tokens, chars, etc.)
- Output:
allow=true/false, estimated cost,request_id - If
allow=false→ raiseBudgetExceededbefore calling the AI API
-
AI call — the actual paid API call (unmodified)
-
Record — after the call, report actual usage
- Input:
request_idfrom preflight + actual units from response - Output:
event_id,cost_usd - If cost-tracker is unavailable → SDK saves to SQLite outbox and retries in background (see wiki/concepts/sync-with-outbox)
- Input:
Why two steps?
- Preflight enables hard budget enforcement before money is spent
- Record captures accurate actual usage (estimated ≠ actual for output tokens)
- Decoupling protects the AI pipeline: if cost-tracker goes down after preflight,
record()still succeeds via outbox
Estimation accuracy
Preflight uses estimated units because output token count is unknown before the call:
| Provider | What we estimate | Accuracy |
|---|---|---|
| Gemini text | input tokens (len/4), output tokens (caller hint) |
±30% |
| Gemini video | input tokens (file-size table), output tokens (hint) | ±50% |
| ElevenLabs | chars (exact — len(text)) |
100% |
| Google TTS | chars (exact — len(text)) |
100% |
Over-estimation is better than under-estimation for budget enforcement. If you consistently over-estimate by 50%, tune the default estimated_output_tokens hint downward.
Hard limit mechanics
- Preflight computes
current_month_spend + estimated_cost - If this exceeds
budget.amount_usdANDbudget.hard_limit=True→allow=false - The budget check is eventual (reads from pre-aggregated rollups + today's raw events), not transactional — brief overage is possible under high concurrency
- This is acceptable for AI cost tracking: exact-to-the-cent enforcement would require distributed locks and add unacceptable latency
Projects using this pattern
- wiki/projects-overview/ai-cost-tracker (defines the pattern)
- video-accessibility (first consumer, Phase 1)