obsidian/wiki/concepts/celery-redis-queue-flush-on-deterministic-error.md
2026-04-30 21:23:56 +01:00

3.2 KiB

title aliases tags sources created updated
Celery + Redis: Must Flush Redis on Deterministic Errors
celery-redis-flush
celery-stuck-queue
redis-task-retry-loop
celery
redis
python
gotcha
worker
debugging
daily/2026-04-30.md
2026-04-30 2026-04-30

Celery + Redis: Must Flush Redis on Deterministic Errors

When a Celery task crashes with a deterministic error (type error, missing config, wrong data shape), the task ID remains in the Redis queue and workers retry it in a loop. Resetting job status in MongoDB alone is NOT enough — the Redis queue entry must also be cleared.

Key Points

  • Celery task failure leaves the task ID in the Redis broker queue
  • Workers retry the task on the next cycle, hit the same deterministic error, fail again — infinite loop
  • Resetting MongoDB job status (e.g., status = "pending") does NOT remove the task from Redis
  • Fix: flush Redis + re-enqueue the job from scratch
  • Deterministic errors (type errors, config errors, wrong data shape) will never succeed on retry — retrying them wastes worker cycles and blocks the queue
  • When all retries fail identically, the error is NOT transient — diagnose the root cause before re-running

Details

The Stuck Queue Scenario

1. Task enqueued → Redis queue: [task_id_abc123]
2. Worker picks up task → crashes (TypeError: bytearray vs bytes)
3. Celery marks as failed, increments retry count
4. Task re-queued for next retry → Redis queue: [task_id_abc123]
5. Repeat until max_retries exhausted
6. Job status in MongoDB: still "processing" (or "failed")
7. Developer resets MongoDB status to "pending" 
8. NEW task enqueued → Redis queue: [task_id_abc123, task_id_xyz789]
9. OLD task_id_abc123 STILL runs and fails

The Fix

# Option 1: Full Redis flush (nuclear — clears ALL queues)
docker compose exec redis redis-cli FLUSHALL

# Option 2: Clear specific queue
docker compose exec redis redis-cli DEL celery

# Option 3: Clear named queue (e.g., tts queue)
docker compose exec redis redis-cli DEL tts

After flushing:

  1. Fix the underlying code error
  2. Rebuild/restart the affected worker container
  3. Re-enqueue the job via the application (not by resetting MongoDB status alone)

When to Use This Procedure

Error type Retry useful? Redis flush needed?
Network timeout to external API Yes No
Rate limit (429) Yes (with backoff) No
TypeError, AttributeError No Yes
Missing env var / config No Yes
File not found (runtime dep) No Yes
DB connection error (transient) Yes No

Checking Queue Depth

# See all Redis keys (queues)
docker compose exec redis redis-cli KEYS '*'

# Check queue length
docker compose exec redis redis-cli LLEN celery
docker compose exec redis redis-cli LLEN tts

Sources