134 lines
5.6 KiB
Markdown
134 lines
5.6 KiB
Markdown
---
|
|
title: "Streaming Output in Real-Time"
|
|
aliases: [streaming, partial-messages, stream-events]
|
|
tags: [agent-sdk, streaming, python, typescript, real-time]
|
|
sources: [raw/Stream responses in real-time.md]
|
|
created: 2026-04-17
|
|
updated: 2026-04-17
|
|
---
|
|
|
|
## Overview
|
|
|
|
By default the Agent SDK yields complete `AssistantMessage` objects after each full response. Enable **partial message streaming** to receive incremental tokens and tool-call deltas as they arrive.
|
|
|
|
- **Python**: set `include_partial_messages=True` in `ClaudeAgentOptions`
|
|
- **TypeScript**: set `includePartialMessages: true` in options
|
|
|
|
## How It Works
|
|
|
|
When streaming is enabled, the SDK emits `StreamEvent` messages wrapping raw Claude API events **before** the final `AssistantMessage`. Your loop must check message type first, then inspect the nested event:
|
|
|
|
```python
|
|
async for message in query(prompt="...", options=options):
|
|
if isinstance(message, StreamEvent):
|
|
event = message.event
|
|
if event.get("type") == "content_block_delta":
|
|
delta = event.get("delta", {})
|
|
if delta.get("type") == "text_delta":
|
|
print(delta.get("text", ""), end="", flush=True)
|
|
```
|
|
|
|
## StreamEvent Structure
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `uuid` | str | Unique event identifier |
|
|
| `session_id` | str | Session identifier |
|
|
| `event` | dict | Raw Claude API stream event |
|
|
| `parent_tool_use_id` | str \| None | Set when event is from a subagent |
|
|
|
|
**TypeScript name:** `SDKPartialAssistantMessage` with `type: 'stream_event'`
|
|
|
|
## Common Event Types
|
|
|
|
| Event Type | Description |
|
|
|------------|-------------|
|
|
| `message_start` | New message begins |
|
|
| `content_block_start` | New text or tool-use block begins |
|
|
| `content_block_delta` | Incremental update (`text_delta` or `input_json_delta`) |
|
|
| `content_block_stop` | Block complete |
|
|
| `message_delta` | Stop reason, usage counts |
|
|
| `message_stop` | Message complete |
|
|
|
|
## Message Flow
|
|
|
|
```
|
|
StreamEvent (message_start)
|
|
StreamEvent (content_block_start) ← text block
|
|
StreamEvent (content_block_delta) ← text chunks ...
|
|
StreamEvent (content_block_stop)
|
|
StreamEvent (content_block_start) ← tool_use block
|
|
StreamEvent (content_block_delta) ← input_json_delta chunks ...
|
|
StreamEvent (content_block_stop)
|
|
StreamEvent (message_delta / message_stop)
|
|
AssistantMessage ← complete message
|
|
... tool executes ...
|
|
ResultMessage ← final result
|
|
```
|
|
|
|
Without streaming enabled you receive: `SystemMessage`, `AssistantMessage`, `ResultMessage`, and `SDKCompactBoundaryMessage` (TypeScript) / `SystemMessage` with subtype `"compact_boundary"` (Python).
|
|
|
|
## Streaming Text
|
|
|
|
Look for `content_block_delta` → `delta.type == "text_delta"` → `delta.text`.
|
|
|
|
## Streaming Tool Calls
|
|
|
|
Three events to watch:
|
|
|
|
| Event | Action |
|
|
|-------|--------|
|
|
| `content_block_start` + `content_block.type == "tool_use"` | Tool starting — capture `name` |
|
|
| `content_block_delta` + `delta.type == "input_json_delta"` | Accumulate `partial_json` |
|
|
| `content_block_stop` | Tool call complete — use accumulated JSON |
|
|
|
|
## Building a Streaming UI
|
|
|
|
Use an `in_tool` flag to switch between rendering text tokens and showing a `[Using ToolName...]` status indicator:
|
|
|
|
```python
|
|
in_tool = False
|
|
async for message in query(prompt="...", options=options):
|
|
if isinstance(message, StreamEvent):
|
|
event = message.event
|
|
t = event.get("type")
|
|
if t == "content_block_start":
|
|
cb = event.get("content_block", {})
|
|
if cb.get("type") == "tool_use":
|
|
print(f"\n[Using {cb['name']}...]", end="", flush=True)
|
|
in_tool = True
|
|
elif t == "content_block_delta":
|
|
d = event.get("delta", {})
|
|
if d.get("type") == "text_delta" and not in_tool:
|
|
sys.stdout.write(d.get("text", ""))
|
|
sys.stdout.flush()
|
|
elif t == "content_block_stop" and in_tool:
|
|
print(" done", flush=True)
|
|
in_tool = False
|
|
```
|
|
|
|
## Known Limitations
|
|
|
|
- **Extended thinking**: when `max_thinking_tokens` / `maxThinkingTokens` is set, `StreamEvent` messages are **not** emitted — only complete messages arrive. Thinking is off by default, so streaming works unless you explicitly enable it.
|
|
- **Structured output**: JSON result appears only in the final `ResultMessage.structured_output`, never as streaming deltas. See [[wiki/agent-sdk/structured-outputs|structured-outputs]].
|
|
|
|
## Key Takeaways
|
|
|
|
- Set `include_partial_messages=True` (Python) / `includePartialMessages: true` (TypeScript) to opt in.
|
|
- Events arrive as `StreamEvent` wrappers around raw Claude API streaming events — you accumulate text/JSON yourself.
|
|
- Text: `content_block_delta` → `text_delta` → `delta.text`
|
|
- Tool input: `content_block_delta` → `input_json_delta` → `delta.partial_json`
|
|
- The complete `AssistantMessage` still arrives after all deltas — you don't have to reconstruct it.
|
|
- Incompatible with extended thinking; structured output only in final `ResultMessage`.
|
|
|
|
## Related
|
|
|
|
- [[wiki/agent-sdk/agent-loop|agent-loop]] — full message lifecycle, turn structure, compaction
|
|
- [[wiki/agent-sdk/python-api-reference|python-api-reference]] — `ClaudeAgentOptions`, `StreamEvent`, all types
|
|
- [[wiki/agent-sdk/typescript-api-reference|typescript-api-reference]] — `SDKPartialAssistantMessage`, `includePartialMessages`
|
|
- [[wiki/agent-sdk/structured-outputs|structured-outputs]] — JSON results from agents (not streaming)
|
|
- [[wiki/agent-sdk/user-input-approvals|user-input-approvals]] — `canUseTool`, Python streaming workaround
|
|
|
|
## Sources
|
|
|
|
- `raw/Stream responses in real-time.md` — official Agent SDK streaming output docs
|