4 KiB
| title | aliases | tags | sources | created | updated | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Hosting the Agent SDK in Production |
|
|
|
2026-04-17 | 2026-04-17 |
Hosting the Agent SDK in Production
The Claude Agent SDK is not stateless — it runs a persistent shell process that executes commands and manages files. This makes hosting it fundamentally different from calling a REST API.
System Requirements
| Resource | Minimum |
|---|---|
| Runtime | Python 3.10+ or Node.js 18+ |
| Node.js | Always required (SDK spawns Claude Code CLI internally; bundled, no separate install) |
| RAM | 1 GiB recommended |
| Disk | 5 GiB recommended |
| CPU | 1 core recommended |
| Network | Outbound HTTPS to api.anthropic.com; optional MCP server access |
Sandbox Provider Options
Managed container sandboxes purpose-built for AI code execution:
- Modal Sandbox — has demo implementation (Slack gif creator)
- Cloudflare Sandboxes — open-source
sandbox-sdk - Daytona
- E2B
- Fly Machines
- Vercel Sandbox
For self-hosted isolation (Docker, gVisor, Firecracker) see the Secure Deployment guide.
Deployment Patterns
Pattern 1: Ephemeral Sessions
New container per task, destroyed on completion. Best for one-off tasks.
- Bug investigation & fix
- Invoice/document processing
- Translation batches
- Image/video transformations
Pattern 2: Long-Running Sessions
Persistent containers, often running multiple Claude Agent processes per container. Best for proactive agents that act without user input.
- Email triage agent
- Per-user site builder (exposes container ports)
- High-frequency chatbots (Slack, etc.)
Pattern 3: Hybrid Sessions
Ephemeral containers hydrated from a database or SDK session resumption. Best for intermittent-interaction workflows.
- Personal project manager with context persistence
- Deep multi-hour research (save, resume)
- Multi-turn customer support tickets
Pattern 4: Single Container, Multiple Agents
Multiple SDK processes share one container. Least common — requires coordination to prevent agents overwriting each other.
- Agent simulations (games, multi-agent environments)
Key Operational Notes
- Communication: Expose HTTP/WebSocket ports from within the container to reach SDK instances externally.
- Cost: Tokens dominate; container overhead starts ~$0.05/hr minimum depending on provider.
- Idle shutdown: Tune idle timeout per provider based on expected user response cadence.
- CLI versioning: Claude Code CLI uses semver — breaking changes are versioned, minor updates are safe to auto-apply.
- Monitoring: Containers are standard servers — use your existing backend logging infrastructure.
- Session timeout: No hard timeout, but set
maxTurnsto prevent infinite loops.
Key Takeaways
- SDK runs as a long-running process, not a stateless call — it needs a persistent container environment.
- Choose deployment pattern based on task lifetime: ephemeral (one-off) → long-running (proactive) → hybrid (intermittent) → single-container multi-agent (collaboration).
- Every instance needs Node.js even for Python SDK (Claude Code CLI is bundled).
- Minimum recommended specs: 1 GiB RAM, 5 GiB disk, 1 CPU.
- Use
maxTurnsto guard against infinite agent loops. - Standard backend observability (logs, metrics) works as-is — containers are just servers.
Related Articles
- wiki/agent-sdk/configure-permissions — control what tools agents can use inside containers
- wiki/agent-sdk/hooks-guide — automate container lifecycle events
- wiki/agent-sdk/mcp-integration — extend agents with external tool servers
- wiki/agent-sdk/user-input-approvals — handle human-in-the-loop approvals in hosted agents
- wiki/architecture/docker-compose — self-hosted container orchestration
Sources
raw/Hosting the Agent SDK.md