obsidian/wiki/tech-patterns/cost-tracker-providers.md

---
title: AI Cost Tracker — Billing Units per Provider
tags: [reference, ai, cost-tracking, providers]
created: 2026-04-27
updated: 2026-04-27
---

# Billing Units per Provider

Reference for how each AI provider bills and how to extract usage data from their API responses.

## Gemini (Google AI / Vertex AI)

**Billing unit:** tokens (input + output separately)

**SDK:** `google-genai` Python SDK

**How to get usage:**
```python
response = await client.models.generate_content(...)

input_tokens  = response.usage_metadata.prompt_token_count
output_tokens = response.usage_metadata.candidates_token_count
total_tokens  = response.usage_metadata.total_token_count
```

> ⚠️ `usage_metadata` is available on all `generate_content` responses including multimodal (video + text prompts). It was **not being read** in video-accessibility before the cost-tracker integration — added as part of Phase 1.

**Token estimation before the call:**
- Text: `len(text) / 4` (rough heuristic; actual tokenisation varies ±30%)
- Video file: use Google's published token table:
  - < 1 min video ≈ 1,000–2,000 tokens + audio
  - Exact: check `google.genai` file metadata after upload
- Image: ~258 tokens per 512×512 tile

**Pricing:** auto-synced from LiteLLM. See [[wiki/tech-patterns/cost-tracker-pricing-sources|cost-tracker-pricing-sources]].

---

## Gemini TTS (audio generation via generate_content)

**Billing unit:** tokens (output audio tokens, different rate from text)

**SDK:** same `google-genai`, with `response_modalities=["AUDIO"]`

**How to get usage:**
```python
response = await client.models.generate_content(
    model="gemini-2.5-flash-preview-tts",
    contents=...,
    config=GenerateContentConfig(response_modalities=["AUDIO"]),
)
output_tokens = response.usage_metadata.candidates_token_count
```

Audio output token rate differs from text output rate — verify in LiteLLM for model `gemini-2.5-flash-preview-tts`.

---

## ElevenLabs TTS

**Billing unit:** characters (input text length)

**SDK:** custom HTTP (`aiohttp` POST to `https://api.elevenlabs.io/v1/text-to-speech/{voice_id}`)

**Response:** returns raw audio bytes. **No usage metadata in response.**

**How to measure:** compute `len(text)` at the call site **before** making the request:

```python
char_count = len(text)
# make the ElevenLabs call
await ct.record(..., chars=char_count, model="eleven_multilingual_v2", ...)
```

**Subscription vs pay-as-you-go:** ElevenLabs bills against a monthly character quota. When quota is exceeded, pay-as-you-go rate applies. The cost-tracker assumes pay-as-you-go for all characters (conservative upper bound). Adjust via admin override if on a subscription plan.

---

## Google Cloud TTS

**Billing unit:** characters (input text length, after SSML stripping)

**SDK:** `google.cloud.texttospeech` Python SDK

**Response:** `SynthesizeSpeechResponse` with `audio_content` (bytes). **No character count in response.**

**How to measure:**
```python
char_count = len(synthesis_input.text)
# for SSML Google bills stripped char count — approximate with len(ssml)
await ct.record(..., chars=char_count, model="standard", ...)
```

**Voice tiers and pricing:**

| Voice type | Billing model name | Price per 1M chars |
|---|---|---|
| Standard | `google_tts/standard` | $4.00 |
| WaveNet | `google_tts/wavenet` | $16.00 |
| Neural2 | `google_tts/neural2` | $16.00 |
| Studio | `google_tts/studio` | $160.00 |

Defined in `pricing/models.yaml` in the cost-tracker repo.

---

## OpenAI (future)

**Billing unit:** tokens (input + output)

```python
response = client.chat.completions.create(...)
input_tokens  = response.usage.prompt_tokens
output_tokens = response.usage.completion_tokens
```

Auto-synced by LiteLLM.

---

## Anthropic Claude (future)

**Billing unit:** tokens (input + output)

```python
response = client.messages.create(...)
input_tokens  = response.usage.input_tokens
output_tokens = response.usage.output_tokens
```

Auto-synced by LiteLLM.

---

## Whisper (self-hosted)

**Not billed per token.** Runs on Cloud Run / GPU compute.

Billing = infrastructure cost (compute time). Phase 1 does not track this.
Future Phase 2: track `audio_duration_seconds` and approximate cost from Cloud Run billing data.