87 lines
3.2 KiB
Markdown
87 lines
3.2 KiB
Markdown
---
|
|
title: "Celery + Redis: Must Flush Redis on Deterministic Errors"
|
|
aliases: [celery-redis-flush, celery-stuck-queue, redis-task-retry-loop]
|
|
tags: [celery, redis, python, gotcha, worker, debugging]
|
|
sources:
|
|
- "daily/2026-04-30.md"
|
|
created: 2026-04-30
|
|
updated: 2026-04-30
|
|
---
|
|
|
|
# Celery + Redis: Must Flush Redis on Deterministic Errors
|
|
|
|
When a Celery task crashes with a deterministic error (type error, missing config, wrong data shape), the task ID remains in the Redis queue and workers retry it in a loop. Resetting job status in MongoDB alone is NOT enough — the Redis queue entry must also be cleared.
|
|
|
|
## Key Points
|
|
|
|
- Celery task failure leaves the task ID in the Redis broker queue
|
|
- Workers retry the task on the next cycle, hit the same deterministic error, fail again — infinite loop
|
|
- Resetting MongoDB job status (e.g., `status = "pending"`) does NOT remove the task from Redis
|
|
- Fix: flush Redis + re-enqueue the job from scratch
|
|
- Deterministic errors (type errors, config errors, wrong data shape) will never succeed on retry — retrying them wastes worker cycles and blocks the queue
|
|
- When all retries fail identically, the error is NOT transient — diagnose the root cause before re-running
|
|
|
|
## Details
|
|
|
|
### The Stuck Queue Scenario
|
|
|
|
```
|
|
1. Task enqueued → Redis queue: [task_id_abc123]
|
|
2. Worker picks up task → crashes (TypeError: bytearray vs bytes)
|
|
3. Celery marks as failed, increments retry count
|
|
4. Task re-queued for next retry → Redis queue: [task_id_abc123]
|
|
5. Repeat until max_retries exhausted
|
|
6. Job status in MongoDB: still "processing" (or "failed")
|
|
7. Developer resets MongoDB status to "pending"
|
|
8. NEW task enqueued → Redis queue: [task_id_abc123, task_id_xyz789]
|
|
9. OLD task_id_abc123 STILL runs and fails
|
|
```
|
|
|
|
### The Fix
|
|
|
|
```bash
|
|
# Option 1: Full Redis flush (nuclear — clears ALL queues)
|
|
docker compose exec redis redis-cli FLUSHALL
|
|
|
|
# Option 2: Clear specific queue
|
|
docker compose exec redis redis-cli DEL celery
|
|
|
|
# Option 3: Clear named queue (e.g., tts queue)
|
|
docker compose exec redis redis-cli DEL tts
|
|
```
|
|
|
|
After flushing:
|
|
1. Fix the underlying code error
|
|
2. Rebuild/restart the affected worker container
|
|
3. Re-enqueue the job via the application (not by resetting MongoDB status alone)
|
|
|
|
### When to Use This Procedure
|
|
|
|
| Error type | Retry useful? | Redis flush needed? |
|
|
|------------|---------------|---------------------|
|
|
| Network timeout to external API | Yes | No |
|
|
| Rate limit (429) | Yes (with backoff) | No |
|
|
| TypeError, AttributeError | No | Yes |
|
|
| Missing env var / config | No | Yes |
|
|
| File not found (runtime dep) | No | Yes |
|
|
| DB connection error (transient) | Yes | No |
|
|
|
|
### Checking Queue Depth
|
|
|
|
```bash
|
|
# See all Redis keys (queues)
|
|
docker compose exec redis redis-cli KEYS '*'
|
|
|
|
# Check queue length
|
|
docker compose exec redis redis-cli LLEN celery
|
|
docker compose exec redis redis-cli LLEN tts
|
|
```
|
|
|
|
## Related Concepts
|
|
|
|
- [[wiki/concepts/lameenc-bytearray-gcs-upload]] — example of a deterministic error (bytearray TypeError) that caused this scenario
|
|
- [[wiki/concepts/celery-queue-worker-specialization]] — queue naming and which workers consume which queue
|
|
|
|
## Sources
|
|
|
|
- [[daily/2026-04-30.md]] — Session 17:09, Celery retry loop after lameenc bytearray TypeError
|