obsidian/wiki/agent-sdk/secure-deployment.md
2026-04-17 13:11:43 +01:00

8.3 KiB
Raw Blame History

title aliases tags sources created updated
Securely Deploying AI Agents
agent-security
secure-agent-deployment
agent-hardening
security
agent-sdk
deployment
isolation
credentials
docker
proxy
raw/Securely deploying AI agents.md
2026-04-17 2026-04-17

Securely Deploying AI Agents

Unlike deterministic software, Claude Code and the Agent SDK generate actions dynamically based on context — making them susceptible to prompt injection: malicious instructions embedded in files, webpages, or user input that redirect agent behavior. Defense in depth is the answer.

Not every deployment needs maximum hardening. A developer running locally has different needs than a multi-tenant production system processing untrusted content.

Threat Model

  • Prompt injection — content processed by the agent (READMEs, web pages, files) may contain adversarial instructions
  • Model error — unexpected actions even without adversarial input
  • Credential exposure — agents accessing APIs may leak secrets if not isolated
  • Resource abuse — unbounded memory/CPU/process spawning in multi-tenant environments

Built-in Security Features

Feature What it does
Permissions system Allow/block/prompt per tool or bash command; glob patterns; org-wide policies
Command AST parsing Parses bash into AST before execution; unrecognized constructs and eval always require approval
Web search summarization Summarizes search results instead of passing raw HTML into context
Sandbox mode Optional OS-level filesystem + network restrictions (see wiki/agent-sdk/configure-permissions)

Security Principles

Least Privilege

Resource Restriction
Filesystem Mount only needed dirs; prefer read-only
Network Restrict to specific endpoints via proxy
Credentials Inject via proxy — never expose directly
System capabilities Drop Linux capabilities in containers

Defense in Depth

Layer multiple controls: container isolation → network restrictions → filesystem controls → proxy-level request validation. Each layer limits blast radius if another fails.

Isolation Technologies

Technology Isolation Perf overhead Complexity
sandbox-runtime Good Very low Low
Docker containers Setup-dependent Low Medium
gVisor Excellent MediumHigh Medium
VMs (Firecracker/QEMU) Excellent High MediumHigh

sandbox-runtime

Lightweight, no Docker needed. Uses OS primitives (bubblewrap on Linux, sandbox-exec on macOS).

npm install @anthropic-ai/sandbox-runtime
  • Filesystem: restricts read/write to configured paths
  • Network: routes all traffic through built-in proxy with domain allowlists
  • Limitation: shares host kernel — not suitable for kernel-level isolation requirements

Hardened Docker Container

docker run \
  --cap-drop ALL \
  --security-opt no-new-privileges \
  --security-opt seccomp=/path/to/seccomp-profile.json \
  --read-only \
  --tmpfs /tmp:rw,noexec,nosuid,size=100m \
  --network none \
  --memory 2g \
  --pids-limit 100 \
  --user 1000:1000 \
  -v /path/to/code:/workspace:ro \
  -v /var/run/proxy.sock:/var/run/proxy.sock:ro \
  agent-image

Key flags:

  • --cap-drop ALL — removes NET_ADMIN, SYS_ADMIN, etc.
  • --network none — no network interfaces; agent communicates only via mounted Unix socket to host proxy
  • --read-only + --tmpfs — immutable root fs with ephemeral scratch space
  • -v ...:/workspace:ro — never mount ~/.ssh, ~/.aws, ~/.config

gVisor

Intercepts syscalls in userspace — the agent never directly touches the host kernel.

// /etc/docker/daemon.json
{ "runtimes": { "runsc": { "path": "/usr/local/bin/runsc" } } }
docker run --runtime=runsc agent-image

Performance: CPU-bound ≈ 0% overhead; file I/O can be 10200× slower for heavy open/close patterns.

Firecracker MicroVMs

  • Boot time < 125ms, < 5 MiB overhead
  • Agent VM has no external network — all traffic routed via vsock to host proxy
  • Suitable for per-request isolation in multi-tenant systems

Cloud Deployments

  1. Private subnet with no internet gateway
  2. Cloud firewall (AWS SG / GCP VPC) blocks all egress except to proxy
  3. Proxy (e.g. Envoy with credential_injector) validates, allowlists, injects creds, logs
  4. Minimal IAM permissions on agent's service account

Credential Management

Core pattern: run a proxy outside the agent's security boundary that injects credentials. The agent never sees the actual secret.

Benefits:

  • Credentials stored in one place, not distributed to agents
  • Proxy enforces endpoint allowlists
  • All requests logged for audit

Proxy Configuration

Option 1 — sampling requests only:

export ANTHROPIC_BASE_URL="http://localhost:8080"

Option 2 — system-wide (all HTTP traffic):

export HTTP_PROXY="http://localhost:8080"
export HTTPS_PROXY="http://localhost:8080"

Note: HTTP_PROXY/HTTPS_PROXY creates opaque TLS tunnels for HTTPS — proxy can't inspect/modify without TLS termination. Node.js fetch() ignores these by default; set NODE_USE_ENV_PROXY=1 in Node 24+.

Proxy Options

Proxy Use case
Envoy Production; credential_injector filter
mitmproxy TLS-terminating; inspect/modify HTTPS
Squid ACL-based caching proxy
LiteLLM LLM gateway with rate limiting

Credentials for Other Services

MCP/custom tools (preferred): Agent calls a tool; the actual authenticated request happens outside the agent boundary. No TLS interception needed.

TLS-terminating proxy: Install proxy's CA cert in agent's trust store + configure HTTP_PROXY. Use proxychains or iptables for programs that bypass env vars.

Filesystem Configuration

Files to Exclude Before Mounting

File Risk
.env, .env.local API keys, DB passwords
~/.aws/credentials AWS access keys
~/.config/gcloud/application_default_credentials.json GCP tokens
~/.kube/config Kubernetes credentials
*.pem, *.key Private keys
.npmrc, .pypirc Registry tokens
*-service-account.json GCP service account keys

Writable Workspace Options

Approach Persistence Use case
--tmpfs Ephemeral (cleared on stop) CI/CD, stateless agents
Overlay filesystem Inspect then apply/discard Review-before-commit workflows
Named volume (separate dir) Persistent Output collection

Key Takeaways

  • Prompt injection is the primary threat — content the agent processes can redirect its behavior; built-in summarization and permissions help, but aren't sufficient alone
  • Proxy pattern is the gold standard for credentials — agent never sees secrets; proxy outside the boundary injects them and enforces allowlists
  • --network none + Unix socket is the strongest container network control — agent can only reach what the host proxy allows
  • gVisor for multi-tenant or untrusted content — reduces kernel attack surface significantly despite I/O overhead
  • Never mount sensitive credential directories~/.ssh, ~/.aws, ~/.config must stay outside the agent's view
  • ANTHROPIC_BASE_URL vs HTTP_PROXY — former routes only sampling calls in plaintext; latter routes all traffic but creates opaque HTTPS tunnels
  • Least privilege is layered — filesystem (read-only mounts) + network (allowlists) + capabilities (--cap-drop ALL) + process limits (--pids-limit)

Sources

  • raw/Securely deploying AI agents.md
  • Official docs: https://code.claude.com/docs/en/agent-sdk/secure-deployment