obsidian/wiki/agent-sdk/cost-tracking.md
2026-04-17 13:19:25 +01:00

5 KiB

title aliases tags sources created updated
Cost and Token Usage Tracking
cost-tracking
token-usage
usage-tracking
agent-sdk
cost
tokens
billing
observability
raw/Track cost and usage.md
2026-04-17 2026-04-17

Cost and Token Usage Tracking

The Claude Agent SDK exposes per-step and per-model token usage through the message stream. All cost figures are client-side estimates — not authoritative billing data.

Key Takeaways

  • total_cost_usd / costUSD are estimates computed from a bundled price table; use the Usage and Cost API or Console for billing truth
  • Cost is scoped to a single query() call — sessions do not auto-accumulate; sum manually
  • Parallel tool calls produce multiple assistant messages sharing the same iddeduplicate by ID to avoid inflated token counts
  • The result message is the most reliable place to read cost; prefer total_cost_usd there over summing per-step values
  • Costs are tracked even on failed/error result subtypes — tokens were consumed up to the failure point
  • Prompt caching is automatic; two extra fields cache_creation_input_tokens / cache_read_input_tokens track cache economics

Scoping: query / step / session

Scope What it is Cost reported?
query() call One invocation; may involve multiple steps Yes — in result message
Step Single request/response cycle within a query() Yes — on each AssistantMessage
Session Multiple query() calls linked by session ID No built-in total; accumulate yourself

Get Total Cost of a Query

Read total_cost_usd from the result message:

for await (const message of query({ prompt: "Summarize this project" })) {
  if (message.type === "result") {
    console.log(`Total cost: $${message.total_cost_usd}`);
  }
}

Python equivalent: message.total_cost_usd on ResultMessage.

Track Per-Step Usage (with Deduplication)

Parallel tool calls share the same message.message.id. Always deduplicate:

const seenIds = new Set<string>();
let totalInputTokens = 0;
let totalOutputTokens = 0;

for await (const message of query({ prompt: "..." })) {
  if (message.type === "assistant") {
    const msgId = message.message.id;
    if (!seenIds.has(msgId)) {
      seenIds.add(msgId);
      totalInputTokens += message.message.usage.input_tokens;
      totalOutputTokens += message.message.usage.output_tokens;
    }
  }
}

Python fields: message.usage, message.message_id.

Break Down Usage Per Model

result.modelUsage (TS) / result.model_usage (Python) maps model name → tokens + cost. Useful for multi-model setups (e.g., Haiku subagents + Opus main agent):

for await (const message of query({ prompt: "..." })) {
  if (message.type !== "result") continue;
  for (const [model, usage] of Object.entries(message.modelUsage)) {
    console.log(`${model}: $${usage.costUSD.toFixed(4)}`);
    console.log(`  Input: ${usage.inputTokens}, Output: ${usage.outputTokens}`);
    console.log(`  Cache read: ${usage.cacheReadInputTokens}, Cache create: ${usage.cacheCreationInputTokens}`);
  }
}

Accumulate Costs Across Multiple Calls

let totalSpend = 0;
for (const prompt of prompts) {
  for await (const message of query({ prompt })) {
    if (message.type === "result") {
      totalSpend += message.total_cost_usd;
    }
  }
}
console.log(`Total spend: $${totalSpend.toFixed(4)}`);

Edge Cases

Scenario Guidance
Output token discrepancy for same ID Use the highest value; prefer total_cost_usd from result
Failed/error conversations Always read cost from result regardless of subtype
Cache tokens Track cache_creation_input_tokens and cache_read_input_tokens separately; charged at different rates
Price drift Re-install SDK or use Usage API when accuracy matters

TypeScript vs Python Field Names

Concept TypeScript Python
Per-step usage message.message.usage message.usage
Per-step ID message.message.id message.message_id
Per-model breakdown result.modelUsage result.model_usage
Total cost result.total_cost_usd result.total_cost_usd
Cache fields usage.cacheReadInputTokens message.usage.get("cache_read_input_tokens", 0)

Sources