obsidian/wiki/agent-sdk/observability-opentelemetry.md
2026-04-17 13:01:42 +01:00

5.2 KiB

title aliases tags sources created updated
Observability with OpenTelemetry
otel-agent-sdk
agent-sdk-telemetry
opentelemetry-traces
agent-sdk
observability
opentelemetry
monitoring
tracing
metrics
raw/Observability with OpenTelemetry.md
2026-04-17 2026-04-17

Observability with OpenTelemetry

Export traces, metrics, and log events from the Agent SDK to any OTLP-compatible backend (Honeycomb, Datadog, Grafana, Langfuse, self-hosted collector).

How Telemetry Flows

  • The SDK runs the Claude Code CLI as a child process — the CLI emits telemetry, not the SDK itself
  • Configuration is passed via environment variables inherited by the child process
  • Two configuration strategies:
    • Process environment (recommended for production): set vars in shell/container/orchestrator — all query() calls pick them up automatically
    • Per-call options.env: use when different agents need different telemetry settings
      • Python: env merges on top of inherited environment
      • TypeScript: env replaces inherited environment — always include ...process.env

Three Signals

Signal What it contains Enable with
Metrics Token/cost counters, sessions, lines of code, tool decisions OTEL_METRICS_EXPORTER
Log events Structured records per prompt, API request, error, tool result OTEL_LOGS_EXPORTER
Traces (beta) Spans per interaction, model request, tool call, hook OTEL_TRACES_EXPORTER + CLAUDE_CODE_ENHANCED_TELEMETRY_BETA=1

Enabling Telemetry

Telemetry is off by default. Minimum required:

OTEL_ENV = {
    "CLAUDE_CODE_ENABLE_TELEMETRY": "1",
    "CLAUDE_CODE_ENHANCED_TELEMETRY_BETA": "1",   # required for traces
    "OTEL_TRACES_EXPORTER": "otlp",
    "OTEL_METRICS_EXPORTER": "otlp",
    "OTEL_LOGS_EXPORTER": "otlp",
    "OTEL_EXPORTER_OTLP_PROTOCOL": "http/protobuf",
    "OTEL_EXPORTER_OTLP_ENDPOINT": "http://collector.example.com:4318",
    "OTEL_EXPORTER_OTLP_HEADERS": "Authorization=Bearer your-token",
}

options = ClaudeAgentOptions(env=OTEL_ENV)
async for message in query(prompt="...", options=options):
    print(message)

Do not use console exporter — the SDK uses stdout as its message channel. Use a local OTLP collector or Jaeger for local inspection instead.

Flushing Short-Lived Calls

Default export intervals are slow (metrics: 60s, traces/logs: 5s). For short tasks, lower the intervals:

"OTEL_METRIC_EXPORT_INTERVAL": "1000",   # ms
"OTEL_LOGS_EXPORT_INTERVAL": "1000",
"OTEL_TRACES_EXPORT_INTERVAL": "1000",
  • The CLI flushes on clean exit but is bounded by a timeout — spans can be dropped if the collector is slow
  • Spans are lost entirely if the process is killed before CLI shutdown

Span Names (Traces)

Span Wraps
claude_code.interaction One full agent turn (prompt → response)
claude_code.llm_request Single Claude API call; carries model, latency, token counts
claude_code.tool Tool invocation; child spans: claude_code.tool.blocked_on_user, claude_code.tool.execution
claude_code.hook Hook execution
  • All spans carry session.id — filter on it to group multi-turn sessions into one timeline
  • Set OTEL_METRICS_INCLUDE_SESSION_ID=false to omit the attribute

Tagging Telemetry

Override the default service.name = "claude-code" when running multiple agents:

options = ClaudeAgentOptions(
    env={
        "OTEL_SERVICE_NAME": "support-triage-agent",
        "OTEL_RESOURCE_ATTRIBUTES": "service.version=1.4.0,deployment.environment=production",
    },
)

Sensitive Data Controls

Content is not recorded by default. Opt-in variables:

Variable Adds
OTEL_LOG_USER_PROMPTS=1 Prompt text on events and interaction span
OTEL_LOG_TOOL_DETAILS=1 Tool input args (file paths, shell commands) on tool_result events
OTEL_LOG_TOOL_CONTENT=1 Full tool input/output bodies as span events (max 60 KB, requires tracing enabled)

Leave unset unless your observability pipeline is approved to store the data your agent handles.

Key Takeaways

  • Telemetry comes from the CLI child process, not the SDK — configure via env vars
  • Must set CLAUDE_CODE_ENABLE_TELEMETRY=1 plus at least one OTEL_*_EXPORTER
  • Traces require an additional beta flag: CLAUDE_CODE_ENHANCED_TELEMETRY_BETA=1
  • TypeScript options.env replaces the environment — always spread ...process.env
  • Lower export intervals for short-lived agent calls to avoid dropped spans
  • Content (prompts, tool I/O) is redacted by default; three opt-in vars add it back
  • Use OTEL_SERVICE_NAME to distinguish multiple agents in the same collector

Sources

  • raw/Observability with OpenTelemetry.md