Cost and Token Usage Tracking

The Claude Agent SDK exposes per-step and per-model token usage through the message stream. All cost figures are client-side estimates — not authoritative billing data.

Key Takeaways

total_cost_usd / costUSD are estimates computed from a bundled price table; use the Usage and Cost API or Console for billing truth
Cost is scoped to a single query() call — sessions do not auto-accumulate; sum manually
Parallel tool calls produce multiple assistant messages sharing the same id — deduplicate by ID to avoid inflated token counts
The result message is the most reliable place to read cost; prefer total_cost_usd there over summing per-step values
Costs are tracked even on failed/error result subtypes — tokens were consumed up to the failure point
Prompt caching is automatic; two extra fields cache_creation_input_tokens / cache_read_input_tokens track cache economics

Scoping: query / step / session

Scope	What it is	Cost reported?
`query()` call	One invocation; may involve multiple steps	Yes — in `result` message
Step	Single request/response cycle within a `query()`	Yes — on each `AssistantMessage`
Session	Multiple `query()` calls linked by session ID	No built-in total; accumulate yourself

Get Total Cost of a Query

Read total_cost_usd from the result message:

for await (const message of query({ prompt: "Summarize this project" })) {
  if (message.type === "result") {
    console.log(`Total cost: $${message.total_cost_usd}`);
  }
}

Python equivalent: message.total_cost_usd on ResultMessage.

Track Per-Step Usage (with Deduplication)

Parallel tool calls share the same message.message.id. Always deduplicate:

const seenIds = new Set<string>();
let totalInputTokens = 0;
let totalOutputTokens = 0;

for await (const message of query({ prompt: "..." })) {
  if (message.type === "assistant") {
    const msgId = message.message.id;
    if (!seenIds.has(msgId)) {
      seenIds.add(msgId);
      totalInputTokens += message.message.usage.input_tokens;
      totalOutputTokens += message.message.usage.output_tokens;
    }
  }
}

Python fields: message.usage, message.message_id.

Break Down Usage Per Model

result.modelUsage (TS) / result.model_usage (Python) maps model name → tokens + cost. Useful for multi-model setups (e.g., Haiku subagents + Opus main agent):

for await (const message of query({ prompt: "..." })) {
  if (message.type !== "result") continue;
  for (const [model, usage] of Object.entries(message.modelUsage)) {
    console.log(`${model}: $${usage.costUSD.toFixed(4)}`);
    console.log(`  Input: ${usage.inputTokens}, Output: ${usage.outputTokens}`);
    console.log(`  Cache read: ${usage.cacheReadInputTokens}, Cache create: ${usage.cacheCreationInputTokens}`);
  }
}

Accumulate Costs Across Multiple Calls

let totalSpend = 0;
for (const prompt of prompts) {
  for await (const message of query({ prompt })) {
    if (message.type === "result") {
      totalSpend += message.total_cost_usd;
    }
  }
}
console.log(`Total spend: $${totalSpend.toFixed(4)}`);

Edge Cases

Scenario	Guidance
Output token discrepancy for same ID	Use the highest value; prefer `total_cost_usd` from `result`
Failed/error conversations	Always read cost from `result` regardless of `subtype`
Cache tokens	Track `cache_creation_input_tokens` and `cache_read_input_tokens` separately; charged at different rates
Price drift	Re-install SDK or use Usage API when accuracy matters

TypeScript vs Python Field Names

Concept	TypeScript	Python
Per-step usage	`message.message.usage`	`message.usage`
Per-step ID	`message.message.id`	`message.message_id`
Per-model breakdown	`result.modelUsage`	`result.model_usage`
Total cost	`result.total_cost_usd`	`result.total_cost_usd`
Cache fields	`usage.cacheReadInputTokens`	`message.usage.get("cache_read_input_tokens", 0)`

wiki/agent-sdk/agent-loop — how steps and query() calls are structured
wiki/agent-sdk/observability-opentelemetry — exporting traces and metrics to OTLP backends
wiki/agent-sdk/subagents — multi-model setups where per-model cost breakdown matters
wiki/agent-sdk/streaming-output — real-time message stream that carries usage events

Sources

raw/Track cost and usage.md — source: https://code.claude.com/docs/en/agent-sdk/cost-tracking

5 KiB Raw Blame History