obsidian/wiki/agent-sdk/streaming-output.md
2026-04-17 13:14:32 +01:00

5.6 KiB

title aliases tags sources created updated
Streaming Output in Real-Time
streaming
partial-messages
stream-events
agent-sdk
streaming
python
typescript
real-time
raw/Stream responses in real-time.md
2026-04-17 2026-04-17

Overview

By default the Agent SDK yields complete AssistantMessage objects after each full response. Enable partial message streaming to receive incremental tokens and tool-call deltas as they arrive.

  • Python: set include_partial_messages=True in ClaudeAgentOptions
  • TypeScript: set includePartialMessages: true in options

How It Works

When streaming is enabled, the SDK emits StreamEvent messages wrapping raw Claude API events before the final AssistantMessage. Your loop must check message type first, then inspect the nested event:

async for message in query(prompt="...", options=options):
    if isinstance(message, StreamEvent):
        event = message.event
        if event.get("type") == "content_block_delta":
            delta = event.get("delta", {})
            if delta.get("type") == "text_delta":
                print(delta.get("text", ""), end="", flush=True)

StreamEvent Structure

Field Type Description
uuid str Unique event identifier
session_id str Session identifier
event dict Raw Claude API stream event
parent_tool_use_id str | None Set when event is from a subagent

TypeScript name: SDKPartialAssistantMessage with type: 'stream_event'

Common Event Types

Event Type Description
message_start New message begins
content_block_start New text or tool-use block begins
content_block_delta Incremental update (text_delta or input_json_delta)
content_block_stop Block complete
message_delta Stop reason, usage counts
message_stop Message complete

Message Flow

StreamEvent (message_start)
StreamEvent (content_block_start)  ← text block
StreamEvent (content_block_delta)  ← text chunks ...
StreamEvent (content_block_stop)
StreamEvent (content_block_start)  ← tool_use block
StreamEvent (content_block_delta)  ← input_json_delta chunks ...
StreamEvent (content_block_stop)
StreamEvent (message_delta / message_stop)
AssistantMessage                   ← complete message
... tool executes ...
ResultMessage                      ← final result

Without streaming enabled you receive: SystemMessage, AssistantMessage, ResultMessage, and SDKCompactBoundaryMessage (TypeScript) / SystemMessage with subtype "compact_boundary" (Python).

Streaming Text

Look for content_block_deltadelta.type == "text_delta"delta.text.

Streaming Tool Calls

Three events to watch:

Event Action
content_block_start + content_block.type == "tool_use" Tool starting — capture name
content_block_delta + delta.type == "input_json_delta" Accumulate partial_json
content_block_stop Tool call complete — use accumulated JSON

Building a Streaming UI

Use an in_tool flag to switch between rendering text tokens and showing a [Using ToolName...] status indicator:

in_tool = False
async for message in query(prompt="...", options=options):
    if isinstance(message, StreamEvent):
        event = message.event
        t = event.get("type")
        if t == "content_block_start":
            cb = event.get("content_block", {})
            if cb.get("type") == "tool_use":
                print(f"\n[Using {cb['name']}...]", end="", flush=True)
                in_tool = True
        elif t == "content_block_delta":
            d = event.get("delta", {})
            if d.get("type") == "text_delta" and not in_tool:
                sys.stdout.write(d.get("text", ""))
                sys.stdout.flush()
        elif t == "content_block_stop" and in_tool:
            print(" done", flush=True)
            in_tool = False

Known Limitations

  • Extended thinking: when max_thinking_tokens / maxThinkingTokens is set, StreamEvent messages are not emitted — only complete messages arrive. Thinking is off by default, so streaming works unless you explicitly enable it.
  • Structured output: JSON result appears only in the final ResultMessage.structured_output, never as streaming deltas. See wiki/agent-sdk/structured-outputs.

Key Takeaways

  • Set include_partial_messages=True (Python) / includePartialMessages: true (TypeScript) to opt in.
  • Events arrive as StreamEvent wrappers around raw Claude API streaming events — you accumulate text/JSON yourself.
  • Text: content_block_deltatext_deltadelta.text
  • Tool input: content_block_deltainput_json_deltadelta.partial_json
  • The complete AssistantMessage still arrives after all deltas — you don't have to reconstruct it.
  • Incompatible with extended thinking; structured output only in final ResultMessage.

Sources

  • raw/Stream responses in real-time.md — official Agent SDK streaming output docs