obsidian/wiki/concepts/celery-redis-queue-flush-on-deterministic-error.md
2026-04-30 21:23:56 +01:00

87 lines
3.2 KiB
Markdown

---
title: "Celery + Redis: Must Flush Redis on Deterministic Errors"
aliases: [celery-redis-flush, celery-stuck-queue, redis-task-retry-loop]
tags: [celery, redis, python, gotcha, worker, debugging]
sources:
- "daily/2026-04-30.md"
created: 2026-04-30
updated: 2026-04-30
---
# Celery + Redis: Must Flush Redis on Deterministic Errors
When a Celery task crashes with a deterministic error (type error, missing config, wrong data shape), the task ID remains in the Redis queue and workers retry it in a loop. Resetting job status in MongoDB alone is NOT enough — the Redis queue entry must also be cleared.
## Key Points
- Celery task failure leaves the task ID in the Redis broker queue
- Workers retry the task on the next cycle, hit the same deterministic error, fail again — infinite loop
- Resetting MongoDB job status (e.g., `status = "pending"`) does NOT remove the task from Redis
- Fix: flush Redis + re-enqueue the job from scratch
- Deterministic errors (type errors, config errors, wrong data shape) will never succeed on retry — retrying them wastes worker cycles and blocks the queue
- When all retries fail identically, the error is NOT transient — diagnose the root cause before re-running
## Details
### The Stuck Queue Scenario
```
1. Task enqueued → Redis queue: [task_id_abc123]
2. Worker picks up task → crashes (TypeError: bytearray vs bytes)
3. Celery marks as failed, increments retry count
4. Task re-queued for next retry → Redis queue: [task_id_abc123]
5. Repeat until max_retries exhausted
6. Job status in MongoDB: still "processing" (or "failed")
7. Developer resets MongoDB status to "pending"
8. NEW task enqueued → Redis queue: [task_id_abc123, task_id_xyz789]
9. OLD task_id_abc123 STILL runs and fails
```
### The Fix
```bash
# Option 1: Full Redis flush (nuclear — clears ALL queues)
docker compose exec redis redis-cli FLUSHALL
# Option 2: Clear specific queue
docker compose exec redis redis-cli DEL celery
# Option 3: Clear named queue (e.g., tts queue)
docker compose exec redis redis-cli DEL tts
```
After flushing:
1. Fix the underlying code error
2. Rebuild/restart the affected worker container
3. Re-enqueue the job via the application (not by resetting MongoDB status alone)
### When to Use This Procedure
| Error type | Retry useful? | Redis flush needed? |
|------------|---------------|---------------------|
| Network timeout to external API | Yes | No |
| Rate limit (429) | Yes (with backoff) | No |
| TypeError, AttributeError | No | Yes |
| Missing env var / config | No | Yes |
| File not found (runtime dep) | No | Yes |
| DB connection error (transient) | Yes | No |
### Checking Queue Depth
```bash
# See all Redis keys (queues)
docker compose exec redis redis-cli KEYS '*'
# Check queue length
docker compose exec redis redis-cli LLEN celery
docker compose exec redis redis-cli LLEN tts
```
## Related Concepts
- [[wiki/concepts/lameenc-bytearray-gcs-upload]] — example of a deterministic error (bytearray TypeError) that caused this scenario
- [[wiki/concepts/celery-queue-worker-specialization]] — queue naming and which workers consume which queue
## Sources
- [[daily/2026-04-30.md]] — Session 17:09, Celery retry loop after lameenc bytearray TypeError