59 lines
2.5 KiB
Markdown
59 lines
2.5 KiB
Markdown
---
|
|
title: Preflight + Record Pattern
|
|
tags: [concept, ai, cost-tracking, patterns]
|
|
created: 2026-04-27
|
|
updated: 2026-04-27
|
|
---
|
|
|
|
# Preflight + Record Pattern
|
|
|
|
The core usage-tracking pattern used by the AI Cost Tracker SDK. Every paid AI call follows the same three steps.
|
|
|
|
## The pattern
|
|
|
|
```
|
|
preflight(estimated_units) → call AI → record(actual_units)
|
|
```
|
|
|
|
1. **Preflight** — before the AI call, ask the cost-tracker: "Is this workspace/project within budget?"
|
|
- Input: model name + estimated units (tokens, chars, etc.)
|
|
- Output: `allow=true/false`, estimated cost, `request_id`
|
|
- If `allow=false` → raise `BudgetExceeded` before calling the AI API
|
|
|
|
2. **AI call** — the actual paid API call (unmodified)
|
|
|
|
3. **Record** — after the call, report actual usage
|
|
- Input: `request_id` from preflight + actual units from response
|
|
- Output: `event_id`, `cost_usd`
|
|
- If cost-tracker is unavailable → SDK saves to SQLite outbox and retries in background (see [[wiki/concepts/sync-with-outbox|sync-with-outbox]])
|
|
|
|
## Why two steps?
|
|
|
|
- **Preflight** enables hard budget enforcement **before** money is spent
|
|
- **Record** captures accurate actual usage (estimated ≠ actual for output tokens)
|
|
- Decoupling protects the AI pipeline: if cost-tracker goes down after preflight, `record()` still succeeds via outbox
|
|
|
|
## Estimation accuracy
|
|
|
|
Preflight uses estimated units because output token count is unknown before the call:
|
|
|
|
| Provider | What we estimate | Accuracy |
|
|
|---|---|---|
|
|
| Gemini text | input tokens (`len/4`), output tokens (caller hint) | ±30% |
|
|
| Gemini video | input tokens (file-size table), output tokens (hint) | ±50% |
|
|
| ElevenLabs | chars (exact — `len(text)`) | 100% |
|
|
| Google TTS | chars (exact — `len(text)`) | 100% |
|
|
|
|
Over-estimation is better than under-estimation for budget enforcement. If you consistently over-estimate by 50%, tune the default `estimated_output_tokens` hint downward.
|
|
|
|
## Hard limit mechanics
|
|
|
|
- Preflight computes `current_month_spend + estimated_cost`
|
|
- If this exceeds `budget.amount_usd` AND `budget.hard_limit=True` → `allow=false`
|
|
- The budget check is **eventual** (reads from pre-aggregated rollups + today's raw events), not transactional — brief overage is possible under high concurrency
|
|
- This is acceptable for AI cost tracking: exact-to-the-cent enforcement would require distributed locks and add unacceptable latency
|
|
|
|
## Projects using this pattern
|
|
|
|
- [[wiki/projects-overview/ai-cost-tracker|ai-cost-tracker]] (defines the pattern)
|
|
- video-accessibility (first consumer, Phase 1)
|