4.2 KiB
| title | tags | created | updated | ||||
|---|---|---|---|---|---|---|---|
| AI Cost Tracker — Billing Units per Provider |
|
2026-04-27 | 2026-04-27 |
Billing Units per Provider
Reference for how each AI provider bills and how to extract usage data from their API responses.
Gemini (Google AI / Vertex AI)
Billing unit: tokens (input + output separately)
SDK: google-genai Python SDK
How to get usage:
response = await client.models.generate_content(...)
input_tokens = response.usage_metadata.prompt_token_count
output_tokens = response.usage_metadata.candidates_token_count
total_tokens = response.usage_metadata.total_token_count
⚠️
usage_metadatais available on allgenerate_contentresponses including multimodal (video + text prompts). It was not being read in video-accessibility before the cost-tracker integration — added as part of Phase 1.
Token estimation before the call:
- Text:
len(text) / 4(rough heuristic; actual tokenisation varies ±30%) - Video file: use Google's published token table:
- < 1 min video ≈ 1,000–2,000 tokens + audio
- Exact: check
google.genaifile metadata after upload
- Image: ~258 tokens per 512×512 tile
Pricing: auto-synced from LiteLLM. See wiki/tech-patterns/cost-tracker-pricing-sources.
Gemini TTS (audio generation via generate_content)
Billing unit: tokens (output audio tokens, different rate from text)
SDK: same google-genai, with response_modalities=["AUDIO"]
How to get usage:
response = await client.models.generate_content(
model="gemini-2.5-flash-preview-tts",
contents=...,
config=GenerateContentConfig(response_modalities=["AUDIO"]),
)
output_tokens = response.usage_metadata.candidates_token_count
Audio output token rate differs from text output rate — verify in LiteLLM for model gemini-2.5-flash-preview-tts.
ElevenLabs TTS
Billing unit: characters (input text length)
SDK: custom HTTP (aiohttp POST to https://api.elevenlabs.io/v1/text-to-speech/{voice_id})
Response: returns raw audio bytes. No usage metadata in response.
How to measure: compute len(text) at the call site before making the request:
char_count = len(text)
# make the ElevenLabs call
await ct.record(..., chars=char_count, model="eleven_multilingual_v2", ...)
Subscription vs pay-as-you-go: ElevenLabs bills against a monthly character quota. When quota is exceeded, pay-as-you-go rate applies. The cost-tracker assumes pay-as-you-go for all characters (conservative upper bound). Adjust via admin override if on a subscription plan.
Google Cloud TTS
Billing unit: characters (input text length, after SSML stripping)
SDK: google.cloud.texttospeech Python SDK
Response: SynthesizeSpeechResponse with audio_content (bytes). No character count in response.
How to measure:
char_count = len(synthesis_input.text)
# for SSML Google bills stripped char count — approximate with len(ssml)
await ct.record(..., chars=char_count, model="standard", ...)
Voice tiers and pricing:
| Voice type | Billing model name | Price per 1M chars |
|---|---|---|
| Standard | google_tts/standard |
$4.00 |
| WaveNet | google_tts/wavenet |
$16.00 |
| Neural2 | google_tts/neural2 |
$16.00 |
| Studio | google_tts/studio |
$160.00 |
Defined in pricing/models.yaml in the cost-tracker repo.
OpenAI (future)
Billing unit: tokens (input + output)
response = client.chat.completions.create(...)
input_tokens = response.usage.prompt_tokens
output_tokens = response.usage.completion_tokens
Auto-synced by LiteLLM.
Anthropic Claude (future)
Billing unit: tokens (input + output)
response = client.messages.create(...)
input_tokens = response.usage.input_tokens
output_tokens = response.usage.output_tokens
Auto-synced by LiteLLM.
Whisper (self-hosted)
Not billed per token. Runs on Cloud Run / GPU compute.
Billing = infrastructure cost (compute time). Phase 1 does not track this.
Future Phase 2: track audio_duration_seconds and approximate cost from Cloud Run billing data.