5 KiB
5 KiB
| title | aliases | tags | sources | created | updated | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Cost and Token Usage Tracking |
|
|
|
2026-04-17 | 2026-04-17 |
Cost and Token Usage Tracking
The Claude Agent SDK exposes per-step and per-model token usage through the message stream. All cost figures are client-side estimates — not authoritative billing data.
Key Takeaways
total_cost_usd/costUSDare estimates computed from a bundled price table; use the Usage and Cost API or Console for billing truth- Cost is scoped to a single
query()call — sessions do not auto-accumulate; sum manually - Parallel tool calls produce multiple assistant messages sharing the same
id— deduplicate by ID to avoid inflated token counts - The
resultmessage is the most reliable place to read cost; prefertotal_cost_usdthere over summing per-step values - Costs are tracked even on failed/error result subtypes — tokens were consumed up to the failure point
- Prompt caching is automatic; two extra fields
cache_creation_input_tokens/cache_read_input_tokenstrack cache economics
Scoping: query / step / session
| Scope | What it is | Cost reported? |
|---|---|---|
query() call |
One invocation; may involve multiple steps | Yes — in result message |
| Step | Single request/response cycle within a query() |
Yes — on each AssistantMessage |
| Session | Multiple query() calls linked by session ID |
No built-in total; accumulate yourself |
Get Total Cost of a Query
Read total_cost_usd from the result message:
for await (const message of query({ prompt: "Summarize this project" })) {
if (message.type === "result") {
console.log(`Total cost: $${message.total_cost_usd}`);
}
}
Python equivalent: message.total_cost_usd on ResultMessage.
Track Per-Step Usage (with Deduplication)
Parallel tool calls share the same message.message.id. Always deduplicate:
const seenIds = new Set<string>();
let totalInputTokens = 0;
let totalOutputTokens = 0;
for await (const message of query({ prompt: "..." })) {
if (message.type === "assistant") {
const msgId = message.message.id;
if (!seenIds.has(msgId)) {
seenIds.add(msgId);
totalInputTokens += message.message.usage.input_tokens;
totalOutputTokens += message.message.usage.output_tokens;
}
}
}
Python fields: message.usage, message.message_id.
Break Down Usage Per Model
result.modelUsage (TS) / result.model_usage (Python) maps model name → tokens + cost. Useful for multi-model setups (e.g., Haiku subagents + Opus main agent):
for await (const message of query({ prompt: "..." })) {
if (message.type !== "result") continue;
for (const [model, usage] of Object.entries(message.modelUsage)) {
console.log(`${model}: $${usage.costUSD.toFixed(4)}`);
console.log(` Input: ${usage.inputTokens}, Output: ${usage.outputTokens}`);
console.log(` Cache read: ${usage.cacheReadInputTokens}, Cache create: ${usage.cacheCreationInputTokens}`);
}
}
Accumulate Costs Across Multiple Calls
let totalSpend = 0;
for (const prompt of prompts) {
for await (const message of query({ prompt })) {
if (message.type === "result") {
totalSpend += message.total_cost_usd;
}
}
}
console.log(`Total spend: $${totalSpend.toFixed(4)}`);
Edge Cases
| Scenario | Guidance |
|---|---|
| Output token discrepancy for same ID | Use the highest value; prefer total_cost_usd from result |
| Failed/error conversations | Always read cost from result regardless of subtype |
| Cache tokens | Track cache_creation_input_tokens and cache_read_input_tokens separately; charged at different rates |
| Price drift | Re-install SDK or use Usage API when accuracy matters |
TypeScript vs Python Field Names
| Concept | TypeScript | Python |
|---|---|---|
| Per-step usage | message.message.usage |
message.usage |
| Per-step ID | message.message.id |
message.message_id |
| Per-model breakdown | result.modelUsage |
result.model_usage |
| Total cost | result.total_cost_usd |
result.total_cost_usd |
| Cache fields | usage.cacheReadInputTokens |
message.usage.get("cache_read_input_tokens", 0) |
Related Articles
- wiki/agent-sdk/agent-loop — how steps and
query()calls are structured - wiki/agent-sdk/observability-opentelemetry — exporting traces and metrics to OTLP backends
- wiki/agent-sdk/subagents — multi-model setups where per-model cost breakdown matters
- wiki/agent-sdk/streaming-output — real-time message stream that carries usage events
Sources
raw/Track cost and usage.md— source: https://code.claude.com/docs/en/agent-sdk/cost-tracking