systemd — MemoryMax, OOMPolicy, and Auto-Restart for Memory-Constrained Services

When a systemd-managed service is killed by the Linux OOM (Out-Of-Memory) killer, the default systemd behavior is to put the service into a failed state and not restart it. This means a memory spike that would otherwise be transient results in a permanently dead service until someone manually intervenes. Adding three directives — MemoryMax, OOMPolicy=continue, and RestartSec — changes this to graceful, automatic recovery.

Key Points

Default OOMPolicy=stop: when the OOM killer fires on a service's process, systemd stops the service and does not restart it — the unit enters failed state
OOMPolicy=continue: systemd restarts the service (subject to Restart= policy) instead of stopping after an OOM kill
MemoryMax=: sets a hard memory ceiling for the service's cgroup; the OOM killer will target this service's processes when the limit is reached rather than randomly killing other processes on the system
RestartSec=10: adds a 10-second delay before the restart — prevents rapid restart loops if the OOM condition persists
This pattern is safe for idempotent services (web servers, API backends, daemons) that can be restarted without data loss

Details

The Three-Directive Pattern

[Service]
# Hard memory ceiling — OOM fires within the service's cgroup
MemoryMax=7G

# Continue (restart) instead of stopping after OOM kill
OOMPolicy=continue

# Always restart if the process exits for any reason
Restart=always

# Wait 10 seconds before restart — prevents hot restart loops
RestartSec=10

MemoryMax vs MemoryHigh:

MemoryHigh is a soft limit — the kernel throttles the service when it approaches the limit
MemoryMax is a hard limit — the OOM killer fires when the process exceeds it
Use MemoryMax when you want predictable behavior at a ceiling; use both for graduated throttling

OOMPolicy options:

Value	Behavior
`stop`	Service enters `failed` state (default)
`continue`	Service is restarted per `Restart=` policy
`kill`	Entire service unit (including forked processes) is killed

Applying to an Existing Unit

Prefer a drop-in override to modifying the unit file directly — easier to revert and survives package upgrades:

# Create a drop-in directory for the service
mkdir -p /etc/systemd/system/immich-web.service.d/

# Write the override
cat > /etc/systemd/system/immich-web.service.d/memory.conf << 'EOF'
[Service]
MemoryMax=7G
OOMPolicy=continue
Restart=always
RestartSec=10
EOF

# Apply
systemctl daemon-reload
systemctl restart immich-web

Verifying the Directives Loaded

systemctl show immich-web --property=MemoryMax,OOMPolicy,Restart,RestartUSec
# Expected:
# MemoryMax=7516192768  (7 * 1024^3)
# OOMPolicy=continue
# Restart=always
# RestartUSec=10000000  (10 seconds in microseconds)

If MemoryMax=18446744073709551615 (the max uint64), the directive didn't load — check daemon-reload ran.

Diagnosing an OOM Kill

# Check if a service was OOM killed
journalctl -u immich-web -n 50 | grep -i oom
# Look for: "Out of memory: Kill process" or "oom-kill event in cgroup"

# System-level OOM events
dmesg | grep -i "oom\|killed process"

# Check current memory usage of the service
systemctl status immich-web
# Shows: Memory: X.XG (limit: 7.0G)

When to Use This Pattern

Appropriate for:

Web servers / API backends — stateless, can restart cleanly
Photo services (Immich) — ML processing creates RAM spikes; restart is safe
Background daemons — should be self-healing in production

Not appropriate for:

Databases — OOM mid-write can corrupt data; use MemoryMax + OOMPolicy=stop + alerting instead
Services holding locks — restart may leave stale locks
Services where a restart loop would cause an outage (consider StartLimitIntervalSec and StartLimitBurst to cap restart attempts)

Capping Restart Attempts

To prevent infinite restart loops when the OOM condition is persistent:

[Service]
OOMPolicy=continue
Restart=always
RestartSec=30

[Unit]
# Max 3 restarts in 5 minutes before giving up
StartLimitIntervalSec=300
StartLimitBurst=3

After 3 OOM-kills in 5 minutes, the unit enters failed state — signaling that the root cause needs attention rather than perpetual restarts.

wiki/concepts/immich-lxc-ram-oom — the Immich-specific application of this pattern (OOM recovery for Immich in LXC)
wiki/concepts/ollama-lxc-ram-requirements — RAM sizing for another memory-hungry service in LXC (Ollama)
wiki/concepts/beszel-monitoring-deployment — monitoring service RAM usage to catch OOM-prone services before they crash
wiki/homelab/_index — homelab context where this pattern is applied

Sources

daily/2026-04-24.md — Immich OOM recovery: immich-web.service killed at 5.1/6 GB; MemoryMax=7G, OOMPolicy=continue, RestartSec=10 added to unit; pattern generalized here for reuse across other memory-constrained homelab services

5.4 KiB Raw Blame History