141 lines
4.2 KiB
Markdown
141 lines
4.2 KiB
Markdown
---
|
||
title: AI Cost Tracker — Billing Units per Provider
|
||
tags: [reference, ai, cost-tracking, providers]
|
||
created: 2026-04-27
|
||
updated: 2026-04-27
|
||
---
|
||
|
||
# Billing Units per Provider
|
||
|
||
Reference for how each AI provider bills and how to extract usage data from their API responses.
|
||
|
||
## Gemini (Google AI / Vertex AI)
|
||
|
||
**Billing unit:** tokens (input + output separately)
|
||
|
||
**SDK:** `google-genai` Python SDK
|
||
|
||
**How to get usage:**
|
||
```python
|
||
response = await client.models.generate_content(...)
|
||
|
||
input_tokens = response.usage_metadata.prompt_token_count
|
||
output_tokens = response.usage_metadata.candidates_token_count
|
||
total_tokens = response.usage_metadata.total_token_count
|
||
```
|
||
|
||
> ⚠️ `usage_metadata` is available on all `generate_content` responses including multimodal (video + text prompts). It was **not being read** in video-accessibility before the cost-tracker integration — added as part of Phase 1.
|
||
|
||
**Token estimation before the call:**
|
||
- Text: `len(text) / 4` (rough heuristic; actual tokenisation varies ±30%)
|
||
- Video file: use Google's published token table:
|
||
- < 1 min video ≈ 1,000–2,000 tokens + audio
|
||
- Exact: check `google.genai` file metadata after upload
|
||
- Image: ~258 tokens per 512×512 tile
|
||
|
||
**Pricing:** auto-synced from LiteLLM. See [[wiki/tech-patterns/cost-tracker-pricing-sources|cost-tracker-pricing-sources]].
|
||
|
||
---
|
||
|
||
## Gemini TTS (audio generation via generate_content)
|
||
|
||
**Billing unit:** tokens (output audio tokens, different rate from text)
|
||
|
||
**SDK:** same `google-genai`, with `response_modalities=["AUDIO"]`
|
||
|
||
**How to get usage:**
|
||
```python
|
||
response = await client.models.generate_content(
|
||
model="gemini-2.5-flash-preview-tts",
|
||
contents=...,
|
||
config=GenerateContentConfig(response_modalities=["AUDIO"]),
|
||
)
|
||
output_tokens = response.usage_metadata.candidates_token_count
|
||
```
|
||
|
||
Audio output token rate differs from text output rate — verify in LiteLLM for model `gemini-2.5-flash-preview-tts`.
|
||
|
||
---
|
||
|
||
## ElevenLabs TTS
|
||
|
||
**Billing unit:** characters (input text length)
|
||
|
||
**SDK:** custom HTTP (`aiohttp` POST to `https://api.elevenlabs.io/v1/text-to-speech/{voice_id}`)
|
||
|
||
**Response:** returns raw audio bytes. **No usage metadata in response.**
|
||
|
||
**How to measure:** compute `len(text)` at the call site **before** making the request:
|
||
|
||
```python
|
||
char_count = len(text)
|
||
# make the ElevenLabs call
|
||
await ct.record(..., chars=char_count, model="eleven_multilingual_v2", ...)
|
||
```
|
||
|
||
**Subscription vs pay-as-you-go:** ElevenLabs bills against a monthly character quota. When quota is exceeded, pay-as-you-go rate applies. The cost-tracker assumes pay-as-you-go for all characters (conservative upper bound). Adjust via admin override if on a subscription plan.
|
||
|
||
---
|
||
|
||
## Google Cloud TTS
|
||
|
||
**Billing unit:** characters (input text length, after SSML stripping)
|
||
|
||
**SDK:** `google.cloud.texttospeech` Python SDK
|
||
|
||
**Response:** `SynthesizeSpeechResponse` with `audio_content` (bytes). **No character count in response.**
|
||
|
||
**How to measure:**
|
||
```python
|
||
char_count = len(synthesis_input.text)
|
||
# for SSML Google bills stripped char count — approximate with len(ssml)
|
||
await ct.record(..., chars=char_count, model="standard", ...)
|
||
```
|
||
|
||
**Voice tiers and pricing:**
|
||
|
||
| Voice type | Billing model name | Price per 1M chars |
|
||
|---|---|---|
|
||
| Standard | `google_tts/standard` | $4.00 |
|
||
| WaveNet | `google_tts/wavenet` | $16.00 |
|
||
| Neural2 | `google_tts/neural2` | $16.00 |
|
||
| Studio | `google_tts/studio` | $160.00 |
|
||
|
||
Defined in `pricing/models.yaml` in the cost-tracker repo.
|
||
|
||
---
|
||
|
||
## OpenAI (future)
|
||
|
||
**Billing unit:** tokens (input + output)
|
||
|
||
```python
|
||
response = client.chat.completions.create(...)
|
||
input_tokens = response.usage.prompt_tokens
|
||
output_tokens = response.usage.completion_tokens
|
||
```
|
||
|
||
Auto-synced by LiteLLM.
|
||
|
||
---
|
||
|
||
## Anthropic Claude (future)
|
||
|
||
**Billing unit:** tokens (input + output)
|
||
|
||
```python
|
||
response = client.messages.create(...)
|
||
input_tokens = response.usage.input_tokens
|
||
output_tokens = response.usage.output_tokens
|
||
```
|
||
|
||
Auto-synced by LiteLLM.
|
||
|
||
---
|
||
|
||
## Whisper (self-hosted)
|
||
|
||
**Not billed per token.** Runs on Cloud Run / GPU compute.
|
||
|
||
Billing = infrastructure cost (compute time). Phase 1 does not track this.
|
||
Future Phase 2: track `audio_duration_seconds` and approximate cost from Cloud Run billing data.
|