202 lines
8.3 KiB
Markdown
202 lines
8.3 KiB
Markdown
---
|
||
title: "Securely Deploying AI Agents"
|
||
aliases: [agent-security, secure-agent-deployment, agent-hardening]
|
||
tags: [security, agent-sdk, deployment, isolation, credentials, docker, proxy]
|
||
sources: [raw/Securely deploying AI agents.md]
|
||
created: 2026-04-17
|
||
updated: 2026-04-17
|
||
---
|
||
|
||
# Securely Deploying AI Agents
|
||
|
||
Unlike deterministic software, Claude Code and the Agent SDK generate actions dynamically based on context — making them susceptible to **prompt injection**: malicious instructions embedded in files, webpages, or user input that redirect agent behavior. Defense in depth is the answer.
|
||
|
||
Not every deployment needs maximum hardening. A developer running locally has different needs than a multi-tenant production system processing untrusted content.
|
||
|
||
## Threat Model
|
||
|
||
- **Prompt injection** — content processed by the agent (READMEs, web pages, files) may contain adversarial instructions
|
||
- **Model error** — unexpected actions even without adversarial input
|
||
- **Credential exposure** — agents accessing APIs may leak secrets if not isolated
|
||
- **Resource abuse** — unbounded memory/CPU/process spawning in multi-tenant environments
|
||
|
||
## Built-in Security Features
|
||
|
||
| Feature | What it does |
|
||
|---------|-------------|
|
||
| **Permissions system** | Allow/block/prompt per tool or bash command; glob patterns; org-wide policies |
|
||
| **Command AST parsing** | Parses bash into AST before execution; unrecognized constructs and `eval` always require approval |
|
||
| **Web search summarization** | Summarizes search results instead of passing raw HTML into context |
|
||
| **Sandbox mode** | Optional OS-level filesystem + network restrictions (see [[wiki/agent-sdk/configure-permissions\|configure-permissions]]) |
|
||
|
||
## Security Principles
|
||
|
||
### Least Privilege
|
||
|
||
| Resource | Restriction |
|
||
|----------|------------|
|
||
| Filesystem | Mount only needed dirs; prefer read-only |
|
||
| Network | Restrict to specific endpoints via proxy |
|
||
| Credentials | Inject via proxy — never expose directly |
|
||
| System capabilities | Drop Linux capabilities in containers |
|
||
|
||
### Defense in Depth
|
||
|
||
Layer multiple controls: container isolation → network restrictions → filesystem controls → proxy-level request validation. Each layer limits blast radius if another fails.
|
||
|
||
## Isolation Technologies
|
||
|
||
| Technology | Isolation | Perf overhead | Complexity |
|
||
|-----------|-----------|--------------|------------|
|
||
| sandbox-runtime | Good | Very low | Low |
|
||
| Docker containers | Setup-dependent | Low | Medium |
|
||
| gVisor | Excellent | Medium–High | Medium |
|
||
| VMs (Firecracker/QEMU) | Excellent | High | Medium–High |
|
||
|
||
### sandbox-runtime
|
||
|
||
Lightweight, no Docker needed. Uses OS primitives (`bubblewrap` on Linux, `sandbox-exec` on macOS).
|
||
|
||
```sh
|
||
npm install @anthropic-ai/sandbox-runtime
|
||
```
|
||
|
||
- Filesystem: restricts read/write to configured paths
|
||
- Network: routes all traffic through built-in proxy with domain allowlists
|
||
- Limitation: shares host kernel — not suitable for kernel-level isolation requirements
|
||
|
||
### Hardened Docker Container
|
||
|
||
```sh
|
||
docker run \
|
||
--cap-drop ALL \
|
||
--security-opt no-new-privileges \
|
||
--security-opt seccomp=/path/to/seccomp-profile.json \
|
||
--read-only \
|
||
--tmpfs /tmp:rw,noexec,nosuid,size=100m \
|
||
--network none \
|
||
--memory 2g \
|
||
--pids-limit 100 \
|
||
--user 1000:1000 \
|
||
-v /path/to/code:/workspace:ro \
|
||
-v /var/run/proxy.sock:/var/run/proxy.sock:ro \
|
||
agent-image
|
||
```
|
||
|
||
Key flags:
|
||
- `--cap-drop ALL` — removes `NET_ADMIN`, `SYS_ADMIN`, etc.
|
||
- `--network none` — no network interfaces; agent communicates only via mounted Unix socket to host proxy
|
||
- `--read-only` + `--tmpfs` — immutable root fs with ephemeral scratch space
|
||
- `-v ...:/workspace:ro` — never mount `~/.ssh`, `~/.aws`, `~/.config`
|
||
|
||
### gVisor
|
||
|
||
Intercepts syscalls in userspace — the agent never directly touches the host kernel.
|
||
|
||
```json
|
||
// /etc/docker/daemon.json
|
||
{ "runtimes": { "runsc": { "path": "/usr/local/bin/runsc" } } }
|
||
```
|
||
|
||
```sh
|
||
docker run --runtime=runsc agent-image
|
||
```
|
||
|
||
Performance: CPU-bound ≈ 0% overhead; file I/O can be 10–200× slower for heavy open/close patterns.
|
||
|
||
### Firecracker MicroVMs
|
||
|
||
- Boot time < 125ms, < 5 MiB overhead
|
||
- Agent VM has no external network — all traffic routed via `vsock` to host proxy
|
||
- Suitable for per-request isolation in multi-tenant systems
|
||
|
||
### Cloud Deployments
|
||
|
||
1. Private subnet with no internet gateway
|
||
2. Cloud firewall (AWS SG / GCP VPC) blocks all egress except to proxy
|
||
3. Proxy (e.g. Envoy with `credential_injector`) validates, allowlists, injects creds, logs
|
||
4. Minimal IAM permissions on agent's service account
|
||
|
||
## Credential Management
|
||
|
||
**Core pattern:** run a proxy *outside* the agent's security boundary that injects credentials. The agent never sees the actual secret.
|
||
|
||
Benefits:
|
||
- Credentials stored in one place, not distributed to agents
|
||
- Proxy enforces endpoint allowlists
|
||
- All requests logged for audit
|
||
|
||
### Proxy Configuration
|
||
|
||
**Option 1 — sampling requests only:**
|
||
```sh
|
||
export ANTHROPIC_BASE_URL="http://localhost:8080"
|
||
```
|
||
|
||
**Option 2 — system-wide (all HTTP traffic):**
|
||
```sh
|
||
export HTTP_PROXY="http://localhost:8080"
|
||
export HTTPS_PROXY="http://localhost:8080"
|
||
```
|
||
|
||
Note: `HTTP_PROXY`/`HTTPS_PROXY` creates opaque TLS tunnels for HTTPS — proxy can't inspect/modify without TLS termination. Node.js `fetch()` ignores these by default; set `NODE_USE_ENV_PROXY=1` in Node 24+.
|
||
|
||
### Proxy Options
|
||
|
||
| Proxy | Use case |
|
||
|-------|---------|
|
||
| Envoy | Production; `credential_injector` filter |
|
||
| mitmproxy | TLS-terminating; inspect/modify HTTPS |
|
||
| Squid | ACL-based caching proxy |
|
||
| LiteLLM | LLM gateway with rate limiting |
|
||
|
||
### Credentials for Other Services
|
||
|
||
**MCP/custom tools (preferred):** Agent calls a tool; the actual authenticated request happens outside the agent boundary. No TLS interception needed.
|
||
|
||
**TLS-terminating proxy:** Install proxy's CA cert in agent's trust store + configure `HTTP_PROXY`. Use `proxychains` or iptables for programs that bypass env vars.
|
||
|
||
## Filesystem Configuration
|
||
|
||
### Files to Exclude Before Mounting
|
||
|
||
| File | Risk |
|
||
|------|------|
|
||
| `.env`, `.env.local` | API keys, DB passwords |
|
||
| `~/.aws/credentials` | AWS access keys |
|
||
| `~/.config/gcloud/application_default_credentials.json` | GCP tokens |
|
||
| `~/.kube/config` | Kubernetes credentials |
|
||
| `*.pem`, `*.key` | Private keys |
|
||
| `.npmrc`, `.pypirc` | Registry tokens |
|
||
| `*-service-account.json` | GCP service account keys |
|
||
|
||
### Writable Workspace Options
|
||
|
||
| Approach | Persistence | Use case |
|
||
|----------|-------------|---------|
|
||
| `--tmpfs` | Ephemeral (cleared on stop) | CI/CD, stateless agents |
|
||
| Overlay filesystem | Inspect then apply/discard | Review-before-commit workflows |
|
||
| Named volume (separate dir) | Persistent | Output collection |
|
||
|
||
## Key Takeaways
|
||
|
||
- **Prompt injection is the primary threat** — content the agent processes can redirect its behavior; built-in summarization and permissions help, but aren't sufficient alone
|
||
- **Proxy pattern is the gold standard for credentials** — agent never sees secrets; proxy outside the boundary injects them and enforces allowlists
|
||
- **`--network none` + Unix socket** is the strongest container network control — agent can only reach what the host proxy allows
|
||
- **gVisor for multi-tenant or untrusted content** — reduces kernel attack surface significantly despite I/O overhead
|
||
- **Never mount sensitive credential directories** — `~/.ssh`, `~/.aws`, `~/.config` must stay outside the agent's view
|
||
- **`ANTHROPIC_BASE_URL` vs `HTTP_PROXY`** — former routes only sampling calls in plaintext; latter routes all traffic but creates opaque HTTPS tunnels
|
||
- **Least privilege is layered** — filesystem (read-only mounts) + network (allowlists) + capabilities (`--cap-drop ALL`) + process limits (`--pids-limit`)
|
||
|
||
## Related
|
||
|
||
- [[wiki/agent-sdk/configure-permissions\|configure-permissions]] — permission modes, allow/deny rules, evaluation order
|
||
- [[wiki/agent-sdk/hosting-production\|hosting-production]] — container requirements, deployment patterns
|
||
- [[wiki/agent-sdk/sdk-hooks\|sdk-hooks]] — PreToolUse/PostToolUse callbacks for runtime control
|
||
- [[wiki/agent-sdk/mcp-integration\|mcp-integration]] — MCP servers as a credential-safe tool boundary
|
||
- [[wiki/architecture/docker-compose\|Docker Compose patterns]] — container orchestration context
|
||
|
||
## Sources
|
||
|
||
- `raw/Securely deploying AI agents.md`
|
||
- Official docs: `https://code.claude.com/docs/en/agent-sdk/secure-deployment`
|