5.6 KiB
| title | aliases | tags | sources | created | updated | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Streaming Output in Real-Time |
|
|
|
2026-04-17 | 2026-04-17 |
Overview
By default the Agent SDK yields complete AssistantMessage objects after each full response. Enable partial message streaming to receive incremental tokens and tool-call deltas as they arrive.
- Python: set
include_partial_messages=TrueinClaudeAgentOptions - TypeScript: set
includePartialMessages: truein options
How It Works
When streaming is enabled, the SDK emits StreamEvent messages wrapping raw Claude API events before the final AssistantMessage. Your loop must check message type first, then inspect the nested event:
async for message in query(prompt="...", options=options):
if isinstance(message, StreamEvent):
event = message.event
if event.get("type") == "content_block_delta":
delta = event.get("delta", {})
if delta.get("type") == "text_delta":
print(delta.get("text", ""), end="", flush=True)
StreamEvent Structure
| Field | Type | Description |
|---|---|---|
uuid |
str | Unique event identifier |
session_id |
str | Session identifier |
event |
dict | Raw Claude API stream event |
parent_tool_use_id |
str | None | Set when event is from a subagent |
TypeScript name: SDKPartialAssistantMessage with type: 'stream_event'
Common Event Types
| Event Type | Description |
|---|---|
message_start |
New message begins |
content_block_start |
New text or tool-use block begins |
content_block_delta |
Incremental update (text_delta or input_json_delta) |
content_block_stop |
Block complete |
message_delta |
Stop reason, usage counts |
message_stop |
Message complete |
Message Flow
StreamEvent (message_start)
StreamEvent (content_block_start) ← text block
StreamEvent (content_block_delta) ← text chunks ...
StreamEvent (content_block_stop)
StreamEvent (content_block_start) ← tool_use block
StreamEvent (content_block_delta) ← input_json_delta chunks ...
StreamEvent (content_block_stop)
StreamEvent (message_delta / message_stop)
AssistantMessage ← complete message
... tool executes ...
ResultMessage ← final result
Without streaming enabled you receive: SystemMessage, AssistantMessage, ResultMessage, and SDKCompactBoundaryMessage (TypeScript) / SystemMessage with subtype "compact_boundary" (Python).
Streaming Text
Look for content_block_delta → delta.type == "text_delta" → delta.text.
Streaming Tool Calls
Three events to watch:
| Event | Action |
|---|---|
content_block_start + content_block.type == "tool_use" |
Tool starting — capture name |
content_block_delta + delta.type == "input_json_delta" |
Accumulate partial_json |
content_block_stop |
Tool call complete — use accumulated JSON |
Building a Streaming UI
Use an in_tool flag to switch between rendering text tokens and showing a [Using ToolName...] status indicator:
in_tool = False
async for message in query(prompt="...", options=options):
if isinstance(message, StreamEvent):
event = message.event
t = event.get("type")
if t == "content_block_start":
cb = event.get("content_block", {})
if cb.get("type") == "tool_use":
print(f"\n[Using {cb['name']}...]", end="", flush=True)
in_tool = True
elif t == "content_block_delta":
d = event.get("delta", {})
if d.get("type") == "text_delta" and not in_tool:
sys.stdout.write(d.get("text", ""))
sys.stdout.flush()
elif t == "content_block_stop" and in_tool:
print(" done", flush=True)
in_tool = False
Known Limitations
- Extended thinking: when
max_thinking_tokens/maxThinkingTokensis set,StreamEventmessages are not emitted — only complete messages arrive. Thinking is off by default, so streaming works unless you explicitly enable it. - Structured output: JSON result appears only in the final
ResultMessage.structured_output, never as streaming deltas. See wiki/agent-sdk/structured-outputs.
Key Takeaways
- Set
include_partial_messages=True(Python) /includePartialMessages: true(TypeScript) to opt in. - Events arrive as
StreamEventwrappers around raw Claude API streaming events — you accumulate text/JSON yourself. - Text:
content_block_delta→text_delta→delta.text - Tool input:
content_block_delta→input_json_delta→delta.partial_json - The complete
AssistantMessagestill arrives after all deltas — you don't have to reconstruct it. - Incompatible with extended thinking; structured output only in final
ResultMessage.
Related
- wiki/agent-sdk/agent-loop — full message lifecycle, turn structure, compaction
- wiki/agent-sdk/python-api-reference —
ClaudeAgentOptions,StreamEvent, all types - wiki/agent-sdk/typescript-api-reference —
SDKPartialAssistantMessage,includePartialMessages - wiki/agent-sdk/structured-outputs — JSON results from agents (not streaming)
- wiki/agent-sdk/user-input-approvals —
canUseTool, Python streaming workaround
Sources
raw/Stream responses in real-time.md— official Agent SDK streaming output docs