obsidian/wiki/tech-patterns/redis-celery-worker-queue.md
2026-04-15 10:48:47 +01:00

92 lines
3.4 KiB
Markdown

---
title: "Redis + Celery Async Worker Queue"
aliases: [celery, task-queue, worker, redis-queue]
tags: [redis, celery, async, worker, queue, python]
sources: [01 Projects/enterprise-ai-hub-nexus, 01 Projects/video-accessibility, 01 Projects/pdf-accessibility]
created: 2026-04-15
updated: 2026-04-15
---
# Redis + Celery Async Worker Queue
Pattern for offloading long-running AI/processing tasks to background workers. Used in the heaviest Oliver processing pipelines.
## Key Takeaways
- Redis is both the message broker AND result backend for Celery
- Use Celery when tasks take >5s (AI inference, video processing, PDF analysis)
- `Celery beat` for scheduled recurring tasks (e.g., SharePoint sync)
- PDF Accessibility uses Redis queue directly (`pdf:queue`) without Celery — simpler `worker.py` daemon
- Always poll for task status from the frontend; never block on long tasks
## When to Use
- Video processing pipelines (multi-phase, minutes-long)
- Scheduled sync jobs (Celery beat)
- Any task that would timeout an HTTP request (>30s)
- Parallel AI analysis tasks
## Key Details
### Standard Setup
```yaml
# docker-compose.yml
services:
redis:
image: redis:7
ports: ["6379:6379"]
worker:
build: ./backend
command: celery -A app.celery worker --loglevel=info
depends_on: [redis]
beat:
build: ./backend
command: celery -A app.celery beat --loglevel=info
depends_on: [redis]
```
### Task Definition
```python
@celery.task
def process_video(video_id: str):
# Long-running pipeline
phase_1_ingest(video_id)
phase_2_caption(video_id) # Gemini 2.5 Pro
phase_3_translate(video_id)
phase_4_tts(video_id)
```
### Polling Pattern (Frontend)
```js
// Poll until complete
const poll = async (jobId) => {
const { status } = await api.get(`/jobs/${jobId}/status`)
if (status === 'pending') setTimeout(() => poll(jobId), 2000)
}
```
## Projects Using This Pattern
- [[01 Projects/enterprise-ai-hub-nexus/Enterprise AI Hub Nexus|Enterprise Nexus]] — Celery beat for SharePoint sync + scheduled tasks; Redis 7 + PostgreSQL
- [[01 Projects/video-accessibility/Video Accessibility Platform|Video Accessibility]] — Celery workers for 7-phase video pipeline; Redis + MongoDB Atlas + GCS
- [[01 Projects/pdf-accessibility/PDF Accessibility Checker|PDF Accessibility]] — Custom `worker.py` daemon reading `pdf:queue` from Redis; PostgreSQL for job tracking
## Pipeline Phases (Video Accessibility)
```
1. Upload → Ingestion worker
2. Gemini 2.5 Pro → VTT captions
3. Audio Description generation
4. QC review (approve/reject/edit VTT)
5. Translation → 50+ languages
6. TTS synthesis (GCP TTS + ElevenLabs)
7. Final delivery
```
## Gotchas & Lessons
- Celery beat needs its own container — it manages schedules independently from workers
- Proactive token refresh required for long Celery jobs that need M365 access (Enterprise Nexus)
- `worker.py` simpler alternative to Celery for single-queue use (PDF Accessibility pattern)
- Always store job status in DB (not just Redis) so it survives Redis restart
- `video_accessibility_development_plan.txt` is the authoritative spec — always read before touching that pipeline
## Related
- [[wiki/tech-patterns/fastapi-python-docker|fastapi-python-docker]] — the API layer above
- [[wiki/tech-patterns/python-ai-agents|python-ai-agents]] — what the workers execute
- [[wiki/architecture/gcp-deployment-lb-timeout|gcp-deployment-lb-timeout]] — why polling beats streaming