--- title: AI Cost Tracker — Billing Units per Provider tags: [reference, ai, cost-tracking, providers] created: 2026-04-27 updated: 2026-04-27 --- # Billing Units per Provider Reference for how each AI provider bills and how to extract usage data from their API responses. ## Gemini (Google AI / Vertex AI) **Billing unit:** tokens (input + output separately) **SDK:** `google-genai` Python SDK **How to get usage:** ```python response = await client.models.generate_content(...) input_tokens = response.usage_metadata.prompt_token_count output_tokens = response.usage_metadata.candidates_token_count total_tokens = response.usage_metadata.total_token_count ``` > ⚠️ `usage_metadata` is available on all `generate_content` responses including multimodal (video + text prompts). It was **not being read** in video-accessibility before the cost-tracker integration — added as part of Phase 1. **Token estimation before the call:** - Text: `len(text) / 4` (rough heuristic; actual tokenisation varies ±30%) - Video file: use Google's published token table: - < 1 min video ≈ 1,000–2,000 tokens + audio - Exact: check `google.genai` file metadata after upload - Image: ~258 tokens per 512×512 tile **Pricing:** auto-synced from LiteLLM. See [[wiki/tech-patterns/cost-tracker-pricing-sources|cost-tracker-pricing-sources]]. --- ## Gemini TTS (audio generation via generate_content) **Billing unit:** tokens (output audio tokens, different rate from text) **SDK:** same `google-genai`, with `response_modalities=["AUDIO"]` **How to get usage:** ```python response = await client.models.generate_content( model="gemini-2.5-flash-preview-tts", contents=..., config=GenerateContentConfig(response_modalities=["AUDIO"]), ) output_tokens = response.usage_metadata.candidates_token_count ``` Audio output token rate differs from text output rate — verify in LiteLLM for model `gemini-2.5-flash-preview-tts`. --- ## ElevenLabs TTS **Billing unit:** characters (input text length) **SDK:** custom HTTP (`aiohttp` POST to `https://api.elevenlabs.io/v1/text-to-speech/{voice_id}`) **Response:** returns raw audio bytes. **No usage metadata in response.** **How to measure:** compute `len(text)` at the call site **before** making the request: ```python char_count = len(text) # make the ElevenLabs call await ct.record(..., chars=char_count, model="eleven_multilingual_v2", ...) ``` **Subscription vs pay-as-you-go:** ElevenLabs bills against a monthly character quota. When quota is exceeded, pay-as-you-go rate applies. The cost-tracker assumes pay-as-you-go for all characters (conservative upper bound). Adjust via admin override if on a subscription plan. --- ## Google Cloud TTS **Billing unit:** characters (input text length, after SSML stripping) **SDK:** `google.cloud.texttospeech` Python SDK **Response:** `SynthesizeSpeechResponse` with `audio_content` (bytes). **No character count in response.** **How to measure:** ```python char_count = len(synthesis_input.text) # for SSML Google bills stripped char count — approximate with len(ssml) await ct.record(..., chars=char_count, model="standard", ...) ``` **Voice tiers and pricing:** | Voice type | Billing model name | Price per 1M chars | |---|---|---| | Standard | `google_tts/standard` | $4.00 | | WaveNet | `google_tts/wavenet` | $16.00 | | Neural2 | `google_tts/neural2` | $16.00 | | Studio | `google_tts/studio` | $160.00 | Defined in `pricing/models.yaml` in the cost-tracker repo. --- ## OpenAI (future) **Billing unit:** tokens (input + output) ```python response = client.chat.completions.create(...) input_tokens = response.usage.prompt_tokens output_tokens = response.usage.completion_tokens ``` Auto-synced by LiteLLM. --- ## Anthropic Claude (future) **Billing unit:** tokens (input + output) ```python response = client.messages.create(...) input_tokens = response.usage.input_tokens output_tokens = response.usage.output_tokens ``` Auto-synced by LiteLLM. --- ## Whisper (self-hosted) **Not billed per token.** Runs on Cloud Run / GPU compute. Billing = infrastructure cost (compute time). Phase 1 does not track this. Future Phase 2: track `audio_duration_seconds` and approximate cost from Cloud Run billing data.