3.8 KiB
| title | aliases | tags | sources | created | updated | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Celery Prefork Pool — All Workers Fork at Startup |
|
|
|
2026-04-30 | 2026-04-30 |
Celery Prefork Pool — All Workers Fork at Startup
Celery's default prefork pool forks all CONCURRENCY worker processes immediately at startup, not lazily on first task. Each forked process loads the full Python interpreter plus all imports. With CONCURRENCY=20 and 120 MB per process, that is 2.4 GB of RAM consumed before a single task is processed — enough to OOM-kill a container and stall a pipeline for 15+ minutes while the cause is invisible in application logs.
Key Points
prefork(the default pool type) forks N processes atcelery workerstart time- Each process is a full Python interpreter with all imports loaded
- Memory =
CONCURRENCY × per_worker_MBconsumed before any task runs - OOM manifests as the container being killed, not a Python exception
CONCURRENCY=1is safe but eliminates parallelism — tune with the formula below
Details
Safe Concurrency Formula
CONCURRENCY = floor(container_memory_MB / per_worker_MB)
Measure per_worker_MB with:
# Start one worker, check RSS
celery -A app worker --concurrency=1 &
ps aux | grep celery
Common baselines (no heavy ML models):
- Pure Python FastAPI worker: ~60–80 MB
- Worker that imports
faster_whisper: ~400–800 MB per worker (model loaded per process) - Worker that imports
torch: 300–500 MB baseline
Alternative Pool Types
| Pool | Startup behaviour | Use case |
|---|---|---|
prefork (default) |
All N processes fork immediately | CPU-bound tasks |
solo |
Single-process, no fork | Dev / low-memory containers |
gevent / eventlet |
Green threads, shared process | I/O-bound tasks |
threads |
OS threads, shared process | I/O-bound, simpler than gevent |
Switch via CELERY_POOL=solo or --pool=gevent.
Stacking with ML Libraries
If a worker imports a model library at module level (e.g. faster_whisper, torch), that model is loaded into every forked process. With CONCURRENCY=4 and a 400 MB model, startup RAM = 1.6 GB before any inference runs. See the connection article.
Symptoms
- Container killed within 10–30 seconds of
docker compose up - No Python traceback — OOM killer logs in
dmesg/docker events docker statsshows memory spike to container limit then drop (restart)- Tasks never start processing; queue builds up
Real Incident (2026-04-30)
ffmpeg-worker container set CONCURRENCY=20 with ~120 MB per forked process. Total startup memory: 2.4 GB — consumed before any task was processed. Container hit OOM limit and was killed by Docker within seconds of docker compose up. The pipeline stalled for 15 minutes while the cause was invisible in application logs (no Python traceback, just container restart loop). Diagnosis: docker stats showed memory spike to limit then immediate drop, repeated every ~30 seconds. Fix: reduce CONCURRENCY using the formula floor(container_memory_MB / per_worker_MB).
Related Concepts
- wiki/concepts/faster-whisper-startup-memory — model loads at startup in each worker process
- wiki/connections/celery-prefork-faster-whisper-memory-stacking — the combined effect when both apply
- wiki/concepts/docker-compose-cpu-limits-env — memory limits in Compose override files
- wiki/concepts/celery-queue-worker-specialization — specialised workers, smaller CONCURRENCY per service
Sources
- daily/2026-04-30.md — Session 21:37, ffmpeg-worker OOM diagnosis; CONCURRENCY=20, 2.4 GB pre-task RAM, 15-minute pipeline stall