Vadym Samoilenko 4ba92e8703 vault backup: 2026-04-17 13:11:43

2026-04-17 13:11:43 +01:00

8.3 KiB

Raw Blame History

title

aliases

Securely Deploying AI Agents

Unlike deterministic software, Claude Code and the Agent SDK generate actions dynamically based on context — making them susceptible to prompt injection: malicious instructions embedded in files, webpages, or user input that redirect agent behavior. Defense in depth is the answer.

Not every deployment needs maximum hardening. A developer running locally has different needs than a multi-tenant production system processing untrusted content.

Threat Model

Prompt injection — content processed by the agent (READMEs, web pages, files) may contain adversarial instructions
Model error — unexpected actions even without adversarial input
Credential exposure — agents accessing APIs may leak secrets if not isolated
Resource abuse — unbounded memory/CPU/process spawning in multi-tenant environments

Built-in Security Features

Feature	What it does
Permissions system	Allow/block/prompt per tool or bash command; glob patterns; org-wide policies
Command AST parsing	Parses bash into AST before execution; unrecognized constructs and `eval` always require approval
Web search summarization	Summarizes search results instead of passing raw HTML into context
Sandbox mode	Optional OS-level filesystem + network restrictions (see wiki/agent-sdk/configure-permissions)

Security Principles

Least Privilege

Resource	Restriction
Filesystem	Mount only needed dirs; prefer read-only
Network	Restrict to specific endpoints via proxy
Credentials	Inject via proxy — never expose directly
System capabilities	Drop Linux capabilities in containers

Defense in Depth

Layer multiple controls: container isolation → network restrictions → filesystem controls → proxy-level request validation. Each layer limits blast radius if another fails.

Isolation Technologies

Technology	Isolation	Perf overhead	Complexity
sandbox-runtime	Good	Very low	Low
Docker containers	Setup-dependent	Low	Medium
gVisor	Excellent	Medium–High	Medium
VMs (Firecracker/QEMU)	Excellent	High	Medium–High

sandbox-runtime

Lightweight, no Docker needed. Uses OS primitives (bubblewrap on Linux, sandbox-exec on macOS).

npm install @anthropic-ai/sandbox-runtime

Filesystem: restricts read/write to configured paths
Network: routes all traffic through built-in proxy with domain allowlists
Limitation: shares host kernel — not suitable for kernel-level isolation requirements

Hardened Docker Container

docker run \
  --cap-drop ALL \
  --security-opt no-new-privileges \
  --security-opt seccomp=/path/to/seccomp-profile.json \
  --read-only \
  --tmpfs /tmp:rw,noexec,nosuid,size=100m \
  --network none \
  --memory 2g \
  --pids-limit 100 \
  --user 1000:1000 \
  -v /path/to/code:/workspace:ro \
  -v /var/run/proxy.sock:/var/run/proxy.sock:ro \
  agent-image

Key flags:

--cap-drop ALL — removes NET_ADMIN, SYS_ADMIN, etc.
--network none — no network interfaces; agent communicates only via mounted Unix socket to host proxy
--read-only + --tmpfs — immutable root fs with ephemeral scratch space
-v ...:/workspace:ro — never mount ~/.ssh, ~/.aws, ~/.config

gVisor

Intercepts syscalls in userspace — the agent never directly touches the host kernel.

// /etc/docker/daemon.json
{ "runtimes": { "runsc": { "path": "/usr/local/bin/runsc" } } }

docker run --runtime=runsc agent-image

Performance: CPU-bound ≈ 0% overhead; file I/O can be 10–200× slower for heavy open/close patterns.

Firecracker MicroVMs

Boot time < 125ms, < 5 MiB overhead
Agent VM has no external network — all traffic routed via vsock to host proxy
Suitable for per-request isolation in multi-tenant systems

Cloud Deployments

Private subnet with no internet gateway
Cloud firewall (AWS SG / GCP VPC) blocks all egress except to proxy
Proxy (e.g. Envoy with credential_injector) validates, allowlists, injects creds, logs
Minimal IAM permissions on agent's service account

Credential Management

Core pattern: run a proxy outside the agent's security boundary that injects credentials. The agent never sees the actual secret.

Benefits:

Credentials stored in one place, not distributed to agents
Proxy enforces endpoint allowlists
All requests logged for audit

Proxy Configuration

Option 1 — sampling requests only:

export ANTHROPIC_BASE_URL="http://localhost:8080"

Option 2 — system-wide (all HTTP traffic):

export HTTP_PROXY="http://localhost:8080"
export HTTPS_PROXY="http://localhost:8080"

Note: HTTP_PROXY/HTTPS_PROXY creates opaque TLS tunnels for HTTPS — proxy can't inspect/modify without TLS termination. Node.js fetch() ignores these by default; set NODE_USE_ENV_PROXY=1 in Node 24+.

Proxy Options

Proxy	Use case
Envoy	Production; `credential_injector` filter
mitmproxy	TLS-terminating; inspect/modify HTTPS
Squid	ACL-based caching proxy
LiteLLM	LLM gateway with rate limiting

Credentials for Other Services

MCP/custom tools (preferred): Agent calls a tool; the actual authenticated request happens outside the agent boundary. No TLS interception needed.

TLS-terminating proxy: Install proxy's CA cert in agent's trust store + configure HTTP_PROXY. Use proxychains or iptables for programs that bypass env vars.

Filesystem Configuration

Files to Exclude Before Mounting

File	Risk
`.env`, `.env.local`	API keys, DB passwords
`~/.aws/credentials`	AWS access keys
`~/.config/gcloud/application_default_credentials.json`	GCP tokens
`~/.kube/config`	Kubernetes credentials
`.pem`, `.key`	Private keys
`.npmrc`, `.pypirc`	Registry tokens
`*-service-account.json`	GCP service account keys

Writable Workspace Options

Approach	Persistence	Use case
`--tmpfs`	Ephemeral (cleared on stop)	CI/CD, stateless agents
Overlay filesystem	Inspect then apply/discard	Review-before-commit workflows
Named volume (separate dir)	Persistent	Output collection

Key Takeaways

Prompt injection is the primary threat — content the agent processes can redirect its behavior; built-in summarization and permissions help, but aren't sufficient alone
Proxy pattern is the gold standard for credentials — agent never sees secrets; proxy outside the boundary injects them and enforces allowlists
--network none + Unix socket is the strongest container network control — agent can only reach what the host proxy allows
gVisor for multi-tenant or untrusted content — reduces kernel attack surface significantly despite I/O overhead
Never mount sensitive credential directories — ~/.ssh, ~/.aws, ~/.config must stay outside the agent's view
ANTHROPIC_BASE_URL vs HTTP_PROXY — former routes only sampling calls in plaintext; latter routes all traffic but creates opaque HTTPS tunnels
Least privilege is layered — filesystem (read-only mounts) + network (allowlists) + capabilities (--cap-drop ALL) + process limits (--pids-limit)

wiki/agent-sdk/configure-permissions — permission modes, allow/deny rules, evaluation order
wiki/agent-sdk/hosting-production — container requirements, deployment patterns
wiki/agent-sdk/sdk-hooks — PreToolUse/PostToolUse callbacks for runtime control
wiki/agent-sdk/mcp-integration — MCP servers as a credential-safe tool boundary
wiki/architecture/docker-compose — container orchestration context

Sources

raw/Securely deploying AI agents.md
Official docs: https://code.claude.com/docs/en/agent-sdk/secure-deployment

8.3 KiB Raw Blame History Unescape Escape