obsidian/wiki/tech-patterns/cost-tracker-providers.md
2026-04-27 11:11:54 +01:00

141 lines
4.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: AI Cost Tracker — Billing Units per Provider
tags: [reference, ai, cost-tracking, providers]
created: 2026-04-27
updated: 2026-04-27
---
# Billing Units per Provider
Reference for how each AI provider bills and how to extract usage data from their API responses.
## Gemini (Google AI / Vertex AI)
**Billing unit:** tokens (input + output separately)
**SDK:** `google-genai` Python SDK
**How to get usage:**
```python
response = await client.models.generate_content(...)
input_tokens = response.usage_metadata.prompt_token_count
output_tokens = response.usage_metadata.candidates_token_count
total_tokens = response.usage_metadata.total_token_count
```
> ⚠️ `usage_metadata` is available on all `generate_content` responses including multimodal (video + text prompts). It was **not being read** in video-accessibility before the cost-tracker integration — added as part of Phase 1.
**Token estimation before the call:**
- Text: `len(text) / 4` (rough heuristic; actual tokenisation varies ±30%)
- Video file: use Google's published token table:
- < 1 min video 1,0002,000 tokens + audio
- Exact: check `google.genai` file metadata after upload
- Image: ~258 tokens per 512×512 tile
**Pricing:** auto-synced from LiteLLM. See [[wiki/tech-patterns/cost-tracker-pricing-sources|cost-tracker-pricing-sources]].
---
## Gemini TTS (audio generation via generate_content)
**Billing unit:** tokens (output audio tokens, different rate from text)
**SDK:** same `google-genai`, with `response_modalities=["AUDIO"]`
**How to get usage:**
```python
response = await client.models.generate_content(
model="gemini-2.5-flash-preview-tts",
contents=...,
config=GenerateContentConfig(response_modalities=["AUDIO"]),
)
output_tokens = response.usage_metadata.candidates_token_count
```
Audio output token rate differs from text output rate verify in LiteLLM for model `gemini-2.5-flash-preview-tts`.
---
## ElevenLabs TTS
**Billing unit:** characters (input text length)
**SDK:** custom HTTP (`aiohttp` POST to `https://api.elevenlabs.io/v1/text-to-speech/{voice_id}`)
**Response:** returns raw audio bytes. **No usage metadata in response.**
**How to measure:** compute `len(text)` at the call site **before** making the request:
```python
char_count = len(text)
# make the ElevenLabs call
await ct.record(..., chars=char_count, model="eleven_multilingual_v2", ...)
```
**Subscription vs pay-as-you-go:** ElevenLabs bills against a monthly character quota. When quota is exceeded, pay-as-you-go rate applies. The cost-tracker assumes pay-as-you-go for all characters (conservative upper bound). Adjust via admin override if on a subscription plan.
---
## Google Cloud TTS
**Billing unit:** characters (input text length, after SSML stripping)
**SDK:** `google.cloud.texttospeech` Python SDK
**Response:** `SynthesizeSpeechResponse` with `audio_content` (bytes). **No character count in response.**
**How to measure:**
```python
char_count = len(synthesis_input.text)
# for SSML Google bills stripped char count — approximate with len(ssml)
await ct.record(..., chars=char_count, model="standard", ...)
```
**Voice tiers and pricing:**
| Voice type | Billing model name | Price per 1M chars |
|---|---|---|
| Standard | `google_tts/standard` | $4.00 |
| WaveNet | `google_tts/wavenet` | $16.00 |
| Neural2 | `google_tts/neural2` | $16.00 |
| Studio | `google_tts/studio` | $160.00 |
Defined in `pricing/models.yaml` in the cost-tracker repo.
---
## OpenAI (future)
**Billing unit:** tokens (input + output)
```python
response = client.chat.completions.create(...)
input_tokens = response.usage.prompt_tokens
output_tokens = response.usage.completion_tokens
```
Auto-synced by LiteLLM.
---
## Anthropic Claude (future)
**Billing unit:** tokens (input + output)
```python
response = client.messages.create(...)
input_tokens = response.usage.input_tokens
output_tokens = response.usage.output_tokens
```
Auto-synced by LiteLLM.
---
## Whisper (self-hosted)
**Not billed per token.** Runs on Cloud Run / GPU compute.
Billing = infrastructure cost (compute time). Phase 1 does not track this.
Future Phase 2: track `audio_duration_seconds` and approximate cost from Cloud Run billing data.