8.3 KiB
| title | aliases | tags | sources | created | updated | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Securely Deploying AI Agents |
|
|
|
2026-04-17 | 2026-04-17 |
Securely Deploying AI Agents
Unlike deterministic software, Claude Code and the Agent SDK generate actions dynamically based on context — making them susceptible to prompt injection: malicious instructions embedded in files, webpages, or user input that redirect agent behavior. Defense in depth is the answer.
Not every deployment needs maximum hardening. A developer running locally has different needs than a multi-tenant production system processing untrusted content.
Threat Model
- Prompt injection — content processed by the agent (READMEs, web pages, files) may contain adversarial instructions
- Model error — unexpected actions even without adversarial input
- Credential exposure — agents accessing APIs may leak secrets if not isolated
- Resource abuse — unbounded memory/CPU/process spawning in multi-tenant environments
Built-in Security Features
| Feature | What it does |
|---|---|
| Permissions system | Allow/block/prompt per tool or bash command; glob patterns; org-wide policies |
| Command AST parsing | Parses bash into AST before execution; unrecognized constructs and eval always require approval |
| Web search summarization | Summarizes search results instead of passing raw HTML into context |
| Sandbox mode | Optional OS-level filesystem + network restrictions (see wiki/agent-sdk/configure-permissions) |
Security Principles
Least Privilege
| Resource | Restriction |
|---|---|
| Filesystem | Mount only needed dirs; prefer read-only |
| Network | Restrict to specific endpoints via proxy |
| Credentials | Inject via proxy — never expose directly |
| System capabilities | Drop Linux capabilities in containers |
Defense in Depth
Layer multiple controls: container isolation → network restrictions → filesystem controls → proxy-level request validation. Each layer limits blast radius if another fails.
Isolation Technologies
| Technology | Isolation | Perf overhead | Complexity |
|---|---|---|---|
| sandbox-runtime | Good | Very low | Low |
| Docker containers | Setup-dependent | Low | Medium |
| gVisor | Excellent | Medium–High | Medium |
| VMs (Firecracker/QEMU) | Excellent | High | Medium–High |
sandbox-runtime
Lightweight, no Docker needed. Uses OS primitives (bubblewrap on Linux, sandbox-exec on macOS).
npm install @anthropic-ai/sandbox-runtime
- Filesystem: restricts read/write to configured paths
- Network: routes all traffic through built-in proxy with domain allowlists
- Limitation: shares host kernel — not suitable for kernel-level isolation requirements
Hardened Docker Container
docker run \
--cap-drop ALL \
--security-opt no-new-privileges \
--security-opt seccomp=/path/to/seccomp-profile.json \
--read-only \
--tmpfs /tmp:rw,noexec,nosuid,size=100m \
--network none \
--memory 2g \
--pids-limit 100 \
--user 1000:1000 \
-v /path/to/code:/workspace:ro \
-v /var/run/proxy.sock:/var/run/proxy.sock:ro \
agent-image
Key flags:
--cap-drop ALL— removesNET_ADMIN,SYS_ADMIN, etc.--network none— no network interfaces; agent communicates only via mounted Unix socket to host proxy--read-only+--tmpfs— immutable root fs with ephemeral scratch space-v ...:/workspace:ro— never mount~/.ssh,~/.aws,~/.config
gVisor
Intercepts syscalls in userspace — the agent never directly touches the host kernel.
// /etc/docker/daemon.json
{ "runtimes": { "runsc": { "path": "/usr/local/bin/runsc" } } }
docker run --runtime=runsc agent-image
Performance: CPU-bound ≈ 0% overhead; file I/O can be 10–200× slower for heavy open/close patterns.
Firecracker MicroVMs
- Boot time < 125ms, < 5 MiB overhead
- Agent VM has no external network — all traffic routed via
vsockto host proxy - Suitable for per-request isolation in multi-tenant systems
Cloud Deployments
- Private subnet with no internet gateway
- Cloud firewall (AWS SG / GCP VPC) blocks all egress except to proxy
- Proxy (e.g. Envoy with
credential_injector) validates, allowlists, injects creds, logs - Minimal IAM permissions on agent's service account
Credential Management
Core pattern: run a proxy outside the agent's security boundary that injects credentials. The agent never sees the actual secret.
Benefits:
- Credentials stored in one place, not distributed to agents
- Proxy enforces endpoint allowlists
- All requests logged for audit
Proxy Configuration
Option 1 — sampling requests only:
export ANTHROPIC_BASE_URL="http://localhost:8080"
Option 2 — system-wide (all HTTP traffic):
export HTTP_PROXY="http://localhost:8080"
export HTTPS_PROXY="http://localhost:8080"
Note: HTTP_PROXY/HTTPS_PROXY creates opaque TLS tunnels for HTTPS — proxy can't inspect/modify without TLS termination. Node.js fetch() ignores these by default; set NODE_USE_ENV_PROXY=1 in Node 24+.
Proxy Options
| Proxy | Use case |
|---|---|
| Envoy | Production; credential_injector filter |
| mitmproxy | TLS-terminating; inspect/modify HTTPS |
| Squid | ACL-based caching proxy |
| LiteLLM | LLM gateway with rate limiting |
Credentials for Other Services
MCP/custom tools (preferred): Agent calls a tool; the actual authenticated request happens outside the agent boundary. No TLS interception needed.
TLS-terminating proxy: Install proxy's CA cert in agent's trust store + configure HTTP_PROXY. Use proxychains or iptables for programs that bypass env vars.
Filesystem Configuration
Files to Exclude Before Mounting
| File | Risk |
|---|---|
.env, .env.local |
API keys, DB passwords |
~/.aws/credentials |
AWS access keys |
~/.config/gcloud/application_default_credentials.json |
GCP tokens |
~/.kube/config |
Kubernetes credentials |
*.pem, *.key |
Private keys |
.npmrc, .pypirc |
Registry tokens |
*-service-account.json |
GCP service account keys |
Writable Workspace Options
| Approach | Persistence | Use case |
|---|---|---|
--tmpfs |
Ephemeral (cleared on stop) | CI/CD, stateless agents |
| Overlay filesystem | Inspect then apply/discard | Review-before-commit workflows |
| Named volume (separate dir) | Persistent | Output collection |
Key Takeaways
- Prompt injection is the primary threat — content the agent processes can redirect its behavior; built-in summarization and permissions help, but aren't sufficient alone
- Proxy pattern is the gold standard for credentials — agent never sees secrets; proxy outside the boundary injects them and enforces allowlists
--network none+ Unix socket is the strongest container network control — agent can only reach what the host proxy allows- gVisor for multi-tenant or untrusted content — reduces kernel attack surface significantly despite I/O overhead
- Never mount sensitive credential directories —
~/.ssh,~/.aws,~/.configmust stay outside the agent's view ANTHROPIC_BASE_URLvsHTTP_PROXY— former routes only sampling calls in plaintext; latter routes all traffic but creates opaque HTTPS tunnels- Least privilege is layered — filesystem (read-only mounts) + network (allowlists) + capabilities (
--cap-drop ALL) + process limits (--pids-limit)
Related
- wiki/agent-sdk/configure-permissions — permission modes, allow/deny rules, evaluation order
- wiki/agent-sdk/hosting-production — container requirements, deployment patterns
- wiki/agent-sdk/sdk-hooks — PreToolUse/PostToolUse callbacks for runtime control
- wiki/agent-sdk/mcp-integration — MCP servers as a credential-safe tool boundary
- wiki/architecture/docker-compose — container orchestration context
Sources
raw/Securely deploying AI agents.md- Official docs:
https://code.claude.com/docs/en/agent-sdk/secure-deployment