5.2 KiB
5.2 KiB
| title | aliases | tags | sources | created | updated | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Observability with OpenTelemetry |
|
|
|
2026-04-17 | 2026-04-17 |
Observability with OpenTelemetry
Export traces, metrics, and log events from the Agent SDK to any OTLP-compatible backend (Honeycomb, Datadog, Grafana, Langfuse, self-hosted collector).
How Telemetry Flows
- The SDK runs the Claude Code CLI as a child process — the CLI emits telemetry, not the SDK itself
- Configuration is passed via environment variables inherited by the child process
- Two configuration strategies:
- Process environment (recommended for production): set vars in shell/container/orchestrator — all
query()calls pick them up automatically - Per-call
options.env: use when different agents need different telemetry settings- Python:
envmerges on top of inherited environment - TypeScript:
envreplaces inherited environment — always include...process.env
- Python:
- Process environment (recommended for production): set vars in shell/container/orchestrator — all
Three Signals
| Signal | What it contains | Enable with |
|---|---|---|
| Metrics | Token/cost counters, sessions, lines of code, tool decisions | OTEL_METRICS_EXPORTER |
| Log events | Structured records per prompt, API request, error, tool result | OTEL_LOGS_EXPORTER |
| Traces (beta) | Spans per interaction, model request, tool call, hook | OTEL_TRACES_EXPORTER + CLAUDE_CODE_ENHANCED_TELEMETRY_BETA=1 |
Enabling Telemetry
Telemetry is off by default. Minimum required:
OTEL_ENV = {
"CLAUDE_CODE_ENABLE_TELEMETRY": "1",
"CLAUDE_CODE_ENHANCED_TELEMETRY_BETA": "1", # required for traces
"OTEL_TRACES_EXPORTER": "otlp",
"OTEL_METRICS_EXPORTER": "otlp",
"OTEL_LOGS_EXPORTER": "otlp",
"OTEL_EXPORTER_OTLP_PROTOCOL": "http/protobuf",
"OTEL_EXPORTER_OTLP_ENDPOINT": "http://collector.example.com:4318",
"OTEL_EXPORTER_OTLP_HEADERS": "Authorization=Bearer your-token",
}
options = ClaudeAgentOptions(env=OTEL_ENV)
async for message in query(prompt="...", options=options):
print(message)
Do not use
consoleexporter — the SDK uses stdout as its message channel. Use a local OTLP collector or Jaeger for local inspection instead.
Flushing Short-Lived Calls
Default export intervals are slow (metrics: 60s, traces/logs: 5s). For short tasks, lower the intervals:
"OTEL_METRIC_EXPORT_INTERVAL": "1000", # ms
"OTEL_LOGS_EXPORT_INTERVAL": "1000",
"OTEL_TRACES_EXPORT_INTERVAL": "1000",
- The CLI flushes on clean exit but is bounded by a timeout — spans can be dropped if the collector is slow
- Spans are lost entirely if the process is killed before CLI shutdown
Span Names (Traces)
| Span | Wraps |
|---|---|
claude_code.interaction |
One full agent turn (prompt → response) |
claude_code.llm_request |
Single Claude API call; carries model, latency, token counts |
claude_code.tool |
Tool invocation; child spans: claude_code.tool.blocked_on_user, claude_code.tool.execution |
claude_code.hook |
Hook execution |
- All spans carry
session.id— filter on it to group multi-turn sessions into one timeline - Set
OTEL_METRICS_INCLUDE_SESSION_ID=falseto omit the attribute
Tagging Telemetry
Override the default service.name = "claude-code" when running multiple agents:
options = ClaudeAgentOptions(
env={
"OTEL_SERVICE_NAME": "support-triage-agent",
"OTEL_RESOURCE_ATTRIBUTES": "service.version=1.4.0,deployment.environment=production",
},
)
Sensitive Data Controls
Content is not recorded by default. Opt-in variables:
| Variable | Adds |
|---|---|
OTEL_LOG_USER_PROMPTS=1 |
Prompt text on events and interaction span |
OTEL_LOG_TOOL_DETAILS=1 |
Tool input args (file paths, shell commands) on tool_result events |
OTEL_LOG_TOOL_CONTENT=1 |
Full tool input/output bodies as span events (max 60 KB, requires tracing enabled) |
Leave unset unless your observability pipeline is approved to store the data your agent handles.
Key Takeaways
- Telemetry comes from the CLI child process, not the SDK — configure via env vars
- Must set
CLAUDE_CODE_ENABLE_TELEMETRY=1plus at least oneOTEL_*_EXPORTER - Traces require an additional beta flag:
CLAUDE_CODE_ENHANCED_TELEMETRY_BETA=1 - TypeScript
options.envreplaces the environment — always spread...process.env - Lower export intervals for short-lived agent calls to avoid dropped spans
- Content (prompts, tool I/O) is redacted by default; three opt-in vars add it back
- Use
OTEL_SERVICE_NAMEto distinguish multiple agents in the same collector
Related
- wiki/agent-sdk/hosting-production — set OTEL vars at container/orchestrator level
- wiki/agent-sdk/agent-loop — understand what each span represents
- wiki/agent-sdk/hooks-guide —
claude_code.hookspans wrap hook executions - wiki/claude-code/monitoring-usage — full list of env vars, metric names, event names
Sources
raw/Observability with OpenTelemetry.md