--- title: "Celery + Redis: Must Flush Redis on Deterministic Errors" aliases: [celery-redis-flush, celery-stuck-queue, redis-task-retry-loop] tags: [celery, redis, python, gotcha, worker, debugging] sources: - "daily/2026-04-30.md" created: 2026-04-30 updated: 2026-04-30 --- # Celery + Redis: Must Flush Redis on Deterministic Errors When a Celery task crashes with a deterministic error (type error, missing config, wrong data shape), the task ID remains in the Redis queue and workers retry it in a loop. Resetting job status in MongoDB alone is NOT enough — the Redis queue entry must also be cleared. ## Key Points - Celery task failure leaves the task ID in the Redis broker queue - Workers retry the task on the next cycle, hit the same deterministic error, fail again — infinite loop - Resetting MongoDB job status (e.g., `status = "pending"`) does NOT remove the task from Redis - Fix: flush Redis + re-enqueue the job from scratch - Deterministic errors (type errors, config errors, wrong data shape) will never succeed on retry — retrying them wastes worker cycles and blocks the queue - When all retries fail identically, the error is NOT transient — diagnose the root cause before re-running ## Details ### The Stuck Queue Scenario ``` 1. Task enqueued → Redis queue: [task_id_abc123] 2. Worker picks up task → crashes (TypeError: bytearray vs bytes) 3. Celery marks as failed, increments retry count 4. Task re-queued for next retry → Redis queue: [task_id_abc123] 5. Repeat until max_retries exhausted 6. Job status in MongoDB: still "processing" (or "failed") 7. Developer resets MongoDB status to "pending" 8. NEW task enqueued → Redis queue: [task_id_abc123, task_id_xyz789] 9. OLD task_id_abc123 STILL runs and fails ``` ### The Fix ```bash # Option 1: Full Redis flush (nuclear — clears ALL queues) docker compose exec redis redis-cli FLUSHALL # Option 2: Clear specific queue docker compose exec redis redis-cli DEL celery # Option 3: Clear named queue (e.g., tts queue) docker compose exec redis redis-cli DEL tts ``` After flushing: 1. Fix the underlying code error 2. Rebuild/restart the affected worker container 3. Re-enqueue the job via the application (not by resetting MongoDB status alone) ### When to Use This Procedure | Error type | Retry useful? | Redis flush needed? | |------------|---------------|---------------------| | Network timeout to external API | Yes | No | | Rate limit (429) | Yes (with backoff) | No | | TypeError, AttributeError | No | Yes | | Missing env var / config | No | Yes | | File not found (runtime dep) | No | Yes | | DB connection error (transient) | Yes | No | ### Checking Queue Depth ```bash # See all Redis keys (queues) docker compose exec redis redis-cli KEYS '*' # Check queue length docker compose exec redis redis-cli LLEN celery docker compose exec redis redis-cli LLEN tts ``` ## Related Concepts - [[wiki/concepts/lameenc-bytearray-gcs-upload]] — example of a deterministic error (bytearray TypeError) that caused this scenario - [[wiki/concepts/celery-queue-worker-specialization]] — queue naming and which workers consume which queue ## Sources - [[daily/2026-04-30.md]] — Session 17:09, Celery retry loop after lameenc bytearray TypeError