5.4 KiB
| title | aliases | tags | sources | created | updated | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| systemd — MemoryMax, OOMPolicy, and Auto-Restart for Memory-Constrained Services |
|
|
|
2026-04-24 | 2026-04-24 |
systemd — MemoryMax, OOMPolicy, and Auto-Restart for Memory-Constrained Services
When a systemd-managed service is killed by the Linux OOM (Out-Of-Memory) killer, the default systemd behavior is to put the service into a failed state and not restart it. This means a memory spike that would otherwise be transient results in a permanently dead service until someone manually intervenes. Adding three directives — MemoryMax, OOMPolicy=continue, and RestartSec — changes this to graceful, automatic recovery.
Key Points
- Default
OOMPolicy=stop: when the OOM killer fires on a service's process, systemd stops the service and does not restart it — the unit entersfailedstate OOMPolicy=continue: systemd restarts the service (subject toRestart=policy) instead of stopping after an OOM killMemoryMax=: sets a hard memory ceiling for the service's cgroup; the OOM killer will target this service's processes when the limit is reached rather than randomly killing other processes on the systemRestartSec=10: adds a 10-second delay before the restart — prevents rapid restart loops if the OOM condition persists- This pattern is safe for idempotent services (web servers, API backends, daemons) that can be restarted without data loss
Details
The Three-Directive Pattern
[Service]
# Hard memory ceiling — OOM fires within the service's cgroup
MemoryMax=7G
# Continue (restart) instead of stopping after OOM kill
OOMPolicy=continue
# Always restart if the process exits for any reason
Restart=always
# Wait 10 seconds before restart — prevents hot restart loops
RestartSec=10
MemoryMax vs MemoryHigh:
MemoryHighis a soft limit — the kernel throttles the service when it approaches the limitMemoryMaxis a hard limit — the OOM killer fires when the process exceeds it- Use
MemoryMaxwhen you want predictable behavior at a ceiling; use both for graduated throttling
OOMPolicy options:
| Value | Behavior |
|---|---|
stop |
Service enters failed state (default) |
continue |
Service is restarted per Restart= policy |
kill |
Entire service unit (including forked processes) is killed |
Applying to an Existing Unit
Prefer a drop-in override to modifying the unit file directly — easier to revert and survives package upgrades:
# Create a drop-in directory for the service
mkdir -p /etc/systemd/system/immich-web.service.d/
# Write the override
cat > /etc/systemd/system/immich-web.service.d/memory.conf << 'EOF'
[Service]
MemoryMax=7G
OOMPolicy=continue
Restart=always
RestartSec=10
EOF
# Apply
systemctl daemon-reload
systemctl restart immich-web
Verifying the Directives Loaded
systemctl show immich-web --property=MemoryMax,OOMPolicy,Restart,RestartUSec
# Expected:
# MemoryMax=7516192768 (7 * 1024^3)
# OOMPolicy=continue
# Restart=always
# RestartUSec=10000000 (10 seconds in microseconds)
If MemoryMax=18446744073709551615 (the max uint64), the directive didn't load — check daemon-reload ran.
Diagnosing an OOM Kill
# Check if a service was OOM killed
journalctl -u immich-web -n 50 | grep -i oom
# Look for: "Out of memory: Kill process" or "oom-kill event in cgroup"
# System-level OOM events
dmesg | grep -i "oom\|killed process"
# Check current memory usage of the service
systemctl status immich-web
# Shows: Memory: X.XG (limit: 7.0G)
When to Use This Pattern
Appropriate for:
- Web servers / API backends — stateless, can restart cleanly
- Photo services (Immich) — ML processing creates RAM spikes; restart is safe
- Background daemons — should be self-healing in production
Not appropriate for:
- Databases — OOM mid-write can corrupt data; use
MemoryMax+OOMPolicy=stop+ alerting instead - Services holding locks — restart may leave stale locks
- Services where a restart loop would cause an outage (consider
StartLimitIntervalSecandStartLimitBurstto cap restart attempts)
Capping Restart Attempts
To prevent infinite restart loops when the OOM condition is persistent:
[Service]
OOMPolicy=continue
Restart=always
RestartSec=30
[Unit]
# Max 3 restarts in 5 minutes before giving up
StartLimitIntervalSec=300
StartLimitBurst=3
After 3 OOM-kills in 5 minutes, the unit enters failed state — signaling that the root cause needs attention rather than perpetual restarts.
Related Concepts
- wiki/concepts/immich-lxc-ram-oom — the Immich-specific application of this pattern (OOM recovery for Immich in LXC)
- wiki/concepts/ollama-lxc-ram-requirements — RAM sizing for another memory-hungry service in LXC (Ollama)
- wiki/concepts/beszel-monitoring-deployment — monitoring service RAM usage to catch OOM-prone services before they crash
- wiki/homelab/_index — homelab context where this pattern is applied
Sources
- daily/2026-04-24.md — Immich OOM recovery:
immich-web.servicekilled at 5.1/6 GB;MemoryMax=7G,OOMPolicy=continue,RestartSec=10added to unit; pattern generalized here for reuse across other memory-constrained homelab services