obsidian/wiki/agent-sdk/streaming-output.md
2026-04-17 13:14:32 +01:00

134 lines
5.6 KiB
Markdown

---
title: "Streaming Output in Real-Time"
aliases: [streaming, partial-messages, stream-events]
tags: [agent-sdk, streaming, python, typescript, real-time]
sources: [raw/Stream responses in real-time.md]
created: 2026-04-17
updated: 2026-04-17
---
## Overview
By default the Agent SDK yields complete `AssistantMessage` objects after each full response. Enable **partial message streaming** to receive incremental tokens and tool-call deltas as they arrive.
- **Python**: set `include_partial_messages=True` in `ClaudeAgentOptions`
- **TypeScript**: set `includePartialMessages: true` in options
## How It Works
When streaming is enabled, the SDK emits `StreamEvent` messages wrapping raw Claude API events **before** the final `AssistantMessage`. Your loop must check message type first, then inspect the nested event:
```python
async for message in query(prompt="...", options=options):
if isinstance(message, StreamEvent):
event = message.event
if event.get("type") == "content_block_delta":
delta = event.get("delta", {})
if delta.get("type") == "text_delta":
print(delta.get("text", ""), end="", flush=True)
```
## StreamEvent Structure
| Field | Type | Description |
|-------|------|-------------|
| `uuid` | str | Unique event identifier |
| `session_id` | str | Session identifier |
| `event` | dict | Raw Claude API stream event |
| `parent_tool_use_id` | str \| None | Set when event is from a subagent |
**TypeScript name:** `SDKPartialAssistantMessage` with `type: 'stream_event'`
## Common Event Types
| Event Type | Description |
|------------|-------------|
| `message_start` | New message begins |
| `content_block_start` | New text or tool-use block begins |
| `content_block_delta` | Incremental update (`text_delta` or `input_json_delta`) |
| `content_block_stop` | Block complete |
| `message_delta` | Stop reason, usage counts |
| `message_stop` | Message complete |
## Message Flow
```
StreamEvent (message_start)
StreamEvent (content_block_start) ← text block
StreamEvent (content_block_delta) ← text chunks ...
StreamEvent (content_block_stop)
StreamEvent (content_block_start) ← tool_use block
StreamEvent (content_block_delta) ← input_json_delta chunks ...
StreamEvent (content_block_stop)
StreamEvent (message_delta / message_stop)
AssistantMessage ← complete message
... tool executes ...
ResultMessage ← final result
```
Without streaming enabled you receive: `SystemMessage`, `AssistantMessage`, `ResultMessage`, and `SDKCompactBoundaryMessage` (TypeScript) / `SystemMessage` with subtype `"compact_boundary"` (Python).
## Streaming Text
Look for `content_block_delta``delta.type == "text_delta"``delta.text`.
## Streaming Tool Calls
Three events to watch:
| Event | Action |
|-------|--------|
| `content_block_start` + `content_block.type == "tool_use"` | Tool starting — capture `name` |
| `content_block_delta` + `delta.type == "input_json_delta"` | Accumulate `partial_json` |
| `content_block_stop` | Tool call complete — use accumulated JSON |
## Building a Streaming UI
Use an `in_tool` flag to switch between rendering text tokens and showing a `[Using ToolName...]` status indicator:
```python
in_tool = False
async for message in query(prompt="...", options=options):
if isinstance(message, StreamEvent):
event = message.event
t = event.get("type")
if t == "content_block_start":
cb = event.get("content_block", {})
if cb.get("type") == "tool_use":
print(f"\n[Using {cb['name']}...]", end="", flush=True)
in_tool = True
elif t == "content_block_delta":
d = event.get("delta", {})
if d.get("type") == "text_delta" and not in_tool:
sys.stdout.write(d.get("text", ""))
sys.stdout.flush()
elif t == "content_block_stop" and in_tool:
print(" done", flush=True)
in_tool = False
```
## Known Limitations
- **Extended thinking**: when `max_thinking_tokens` / `maxThinkingTokens` is set, `StreamEvent` messages are **not** emitted — only complete messages arrive. Thinking is off by default, so streaming works unless you explicitly enable it.
- **Structured output**: JSON result appears only in the final `ResultMessage.structured_output`, never as streaming deltas. See [[wiki/agent-sdk/structured-outputs|structured-outputs]].
## Key Takeaways
- Set `include_partial_messages=True` (Python) / `includePartialMessages: true` (TypeScript) to opt in.
- Events arrive as `StreamEvent` wrappers around raw Claude API streaming events — you accumulate text/JSON yourself.
- Text: `content_block_delta``text_delta``delta.text`
- Tool input: `content_block_delta``input_json_delta``delta.partial_json`
- The complete `AssistantMessage` still arrives after all deltas — you don't have to reconstruct it.
- Incompatible with extended thinking; structured output only in final `ResultMessage`.
## Related
- [[wiki/agent-sdk/agent-loop|agent-loop]] — full message lifecycle, turn structure, compaction
- [[wiki/agent-sdk/python-api-reference|python-api-reference]] — `ClaudeAgentOptions`, `StreamEvent`, all types
- [[wiki/agent-sdk/typescript-api-reference|typescript-api-reference]] — `SDKPartialAssistantMessage`, `includePartialMessages`
- [[wiki/agent-sdk/structured-outputs|structured-outputs]] — JSON results from agents (not streaming)
- [[wiki/agent-sdk/user-input-approvals|user-input-approvals]] — `canUseTool`, Python streaming workaround
## Sources
- `raw/Stream responses in real-time.md` — official Agent SDK streaming output docs