Vadym/obsidian

Fork 0

Vadym Samoilenko 41e0ee3ea1 vault backup: 2026-04-27 11:11:54

2026-04-27 11:11:54 +01:00

4.2 KiB

Raw Blame History

title

Billing Units per Provider

Reference for how each AI provider bills and how to extract usage data from their API responses.

Gemini (Google AI / Vertex AI)

Billing unit: tokens (input + output separately)

SDK: google-genai Python SDK

How to get usage:

response = await client.models.generate_content(...)

input_tokens  = response.usage_metadata.prompt_token_count
output_tokens = response.usage_metadata.candidates_token_count
total_tokens  = response.usage_metadata.total_token_count

⚠️ usage_metadata is available on all generate_content responses including multimodal (video + text prompts). It was not being read in video-accessibility before the cost-tracker integration — added as part of Phase 1.

Token estimation before the call:

Text: len(text) / 4 (rough heuristic; actual tokenisation varies ±30%)
Video file: use Google's published token table:
- < 1 min video ≈ 1,000–2,000 tokens + audio
- Exact: check google.genai file metadata after upload
Image: ~258 tokens per 512×512 tile

Pricing: auto-synced from LiteLLM. See wiki/tech-patterns/cost-tracker-pricing-sources.

Gemini TTS (audio generation via generate_content)

Billing unit: tokens (output audio tokens, different rate from text)

SDK: same google-genai, with response_modalities=["AUDIO"]

How to get usage:

response = await client.models.generate_content(
    model="gemini-2.5-flash-preview-tts",
    contents=...,
    config=GenerateContentConfig(response_modalities=["AUDIO"]),
)
output_tokens = response.usage_metadata.candidates_token_count

Audio output token rate differs from text output rate — verify in LiteLLM for model gemini-2.5-flash-preview-tts.

ElevenLabs TTS

Billing unit: characters (input text length)

SDK: custom HTTP (aiohttp POST to https://api.elevenlabs.io/v1/text-to-speech/{voice_id})

Response: returns raw audio bytes. No usage metadata in response.

How to measure: compute len(text) at the call site before making the request:

char_count = len(text)
# make the ElevenLabs call
await ct.record(..., chars=char_count, model="eleven_multilingual_v2", ...)

Subscription vs pay-as-you-go: ElevenLabs bills against a monthly character quota. When quota is exceeded, pay-as-you-go rate applies. The cost-tracker assumes pay-as-you-go for all characters (conservative upper bound). Adjust via admin override if on a subscription plan.

Google Cloud TTS

Billing unit: characters (input text length, after SSML stripping)

SDK: google.cloud.texttospeech Python SDK

Response: SynthesizeSpeechResponse with audio_content (bytes). No character count in response.

How to measure:

char_count = len(synthesis_input.text)
# for SSML Google bills stripped char count — approximate with len(ssml)
await ct.record(..., chars=char_count, model="standard", ...)

Voice tiers and pricing:

Voice type	Billing model name	Price per 1M chars
Standard	`google_tts/standard`	$4.00
WaveNet	`google_tts/wavenet`	$16.00
Neural2	`google_tts/neural2`	$16.00
Studio	`google_tts/studio`	$160.00

Defined in pricing/models.yaml in the cost-tracker repo.

OpenAI (future)

Billing unit: tokens (input + output)

response = client.chat.completions.create(...)
input_tokens  = response.usage.prompt_tokens
output_tokens = response.usage.completion_tokens

Auto-synced by LiteLLM.

Anthropic Claude (future)