Celery Prefork Pool — All Workers Fork at Startup

Celery's default prefork pool forks all CONCURRENCY worker processes immediately at startup, not lazily on first task. Each forked process loads the full Python interpreter plus all imports. With CONCURRENCY=20 and 120 MB per process, that is 2.4 GB of RAM consumed before a single task is processed — enough to OOM-kill a container and stall a pipeline for 15+ minutes while the cause is invisible in application logs.

Key Points

prefork (the default pool type) forks N processes at celery worker start time
Each process is a full Python interpreter with all imports loaded
Memory = CONCURRENCY × per_worker_MB consumed before any task runs
OOM manifests as the container being killed, not a Python exception
CONCURRENCY=1 is safe but eliminates parallelism — tune with the formula below

Details

Safe Concurrency Formula

CONCURRENCY = floor(container_memory_MB / per_worker_MB)

Measure per_worker_MB with:

# Start one worker, check RSS
celery -A app worker --concurrency=1 &
ps aux | grep celery

Common baselines (no heavy ML models):

Pure Python FastAPI worker: ~60–80 MB
Worker that imports faster_whisper: ~400–800 MB per worker (model loaded per process)
Worker that imports torch: 300–500 MB baseline

Alternative Pool Types

Pool	Startup behaviour	Use case
`prefork` (default)	All N processes fork immediately	CPU-bound tasks
`solo`	Single-process, no fork	Dev / low-memory containers
`gevent` / `eventlet`	Green threads, shared process	I/O-bound tasks
`threads`	OS threads, shared process	I/O-bound, simpler than gevent

Switch via CELERY_POOL=solo or --pool=gevent.

Stacking with ML Libraries

If a worker imports a model library at module level (e.g. faster_whisper, torch), that model is loaded into every forked process. With CONCURRENCY=4 and a 400 MB model, startup RAM = 1.6 GB before any inference runs. See the connection article.

Symptoms

Container killed within 10–30 seconds of docker compose up
No Python traceback — OOM killer logs in dmesg / docker events
docker stats shows memory spike to container limit then drop (restart)
Tasks never start processing; queue builds up

Real Incident (2026-04-30)

ffmpeg-worker container set CONCURRENCY=20 with ~120 MB per forked process. Total startup memory: 2.4 GB — consumed before any task was processed. Container hit OOM limit and was killed by Docker within seconds of docker compose up. The pipeline stalled for 15 minutes while the cause was invisible in application logs (no Python traceback, just container restart loop). Diagnosis: docker stats showed memory spike to limit then immediate drop, repeated every ~30 seconds. Fix: reduce CONCURRENCY using the formula floor(container_memory_MB / per_worker_MB).

wiki/concepts/faster-whisper-startup-memory — model loads at startup in each worker process
wiki/connections/celery-prefork-faster-whisper-memory-stacking — the combined effect when both apply
wiki/concepts/docker-compose-cpu-limits-env — memory limits in Compose override files
wiki/concepts/celery-queue-worker-specialization — specialised workers, smaller CONCURRENCY per service

Sources

daily/2026-04-30.md — Session 21:37, ffmpeg-worker OOM diagnosis; CONCURRENCY=20, 2.4 GB pre-task RAM, 15-minute pipeline stall

3.8 KiB Raw Blame History Unescape Escape