92 lines
3.4 KiB
Markdown
92 lines
3.4 KiB
Markdown
---
|
|
title: "Redis + Celery Async Worker Queue"
|
|
aliases: [celery, task-queue, worker, redis-queue]
|
|
tags: [redis, celery, async, worker, queue, python]
|
|
sources: [01 Projects/enterprise-ai-hub-nexus, 01 Projects/video-accessibility, 01 Projects/pdf-accessibility]
|
|
created: 2026-04-15
|
|
updated: 2026-04-15
|
|
---
|
|
|
|
# Redis + Celery Async Worker Queue
|
|
|
|
Pattern for offloading long-running AI/processing tasks to background workers. Used in the heaviest Oliver processing pipelines.
|
|
|
|
## Key Takeaways
|
|
- Redis is both the message broker AND result backend for Celery
|
|
- Use Celery when tasks take >5s (AI inference, video processing, PDF analysis)
|
|
- `Celery beat` for scheduled recurring tasks (e.g., SharePoint sync)
|
|
- PDF Accessibility uses Redis queue directly (`pdf:queue`) without Celery — simpler `worker.py` daemon
|
|
- Always poll for task status from the frontend; never block on long tasks
|
|
|
|
## When to Use
|
|
- Video processing pipelines (multi-phase, minutes-long)
|
|
- Scheduled sync jobs (Celery beat)
|
|
- Any task that would timeout an HTTP request (>30s)
|
|
- Parallel AI analysis tasks
|
|
|
|
## Key Details
|
|
|
|
### Standard Setup
|
|
```yaml
|
|
# docker-compose.yml
|
|
services:
|
|
redis:
|
|
image: redis:7
|
|
ports: ["6379:6379"]
|
|
worker:
|
|
build: ./backend
|
|
command: celery -A app.celery worker --loglevel=info
|
|
depends_on: [redis]
|
|
beat:
|
|
build: ./backend
|
|
command: celery -A app.celery beat --loglevel=info
|
|
depends_on: [redis]
|
|
```
|
|
|
|
### Task Definition
|
|
```python
|
|
@celery.task
|
|
def process_video(video_id: str):
|
|
# Long-running pipeline
|
|
phase_1_ingest(video_id)
|
|
phase_2_caption(video_id) # Gemini 2.5 Pro
|
|
phase_3_translate(video_id)
|
|
phase_4_tts(video_id)
|
|
```
|
|
|
|
### Polling Pattern (Frontend)
|
|
```js
|
|
// Poll until complete
|
|
const poll = async (jobId) => {
|
|
const { status } = await api.get(`/jobs/${jobId}/status`)
|
|
if (status === 'pending') setTimeout(() => poll(jobId), 2000)
|
|
}
|
|
```
|
|
|
|
## Projects Using This Pattern
|
|
- [[01 Projects/enterprise-ai-hub-nexus/Enterprise AI Hub Nexus|Enterprise Nexus]] — Celery beat for SharePoint sync + scheduled tasks; Redis 7 + PostgreSQL
|
|
- [[01 Projects/video-accessibility/Video Accessibility Platform|Video Accessibility]] — Celery workers for 7-phase video pipeline; Redis + MongoDB Atlas + GCS
|
|
- [[01 Projects/pdf-accessibility/PDF Accessibility Checker|PDF Accessibility]] — Custom `worker.py` daemon reading `pdf:queue` from Redis; PostgreSQL for job tracking
|
|
|
|
## Pipeline Phases (Video Accessibility)
|
|
```
|
|
1. Upload → Ingestion worker
|
|
2. Gemini 2.5 Pro → VTT captions
|
|
3. Audio Description generation
|
|
4. QC review (approve/reject/edit VTT)
|
|
5. Translation → 50+ languages
|
|
6. TTS synthesis (GCP TTS + ElevenLabs)
|
|
7. Final delivery
|
|
```
|
|
|
|
## Gotchas & Lessons
|
|
- Celery beat needs its own container — it manages schedules independently from workers
|
|
- Proactive token refresh required for long Celery jobs that need M365 access (Enterprise Nexus)
|
|
- `worker.py` simpler alternative to Celery for single-queue use (PDF Accessibility pattern)
|
|
- Always store job status in DB (not just Redis) so it survives Redis restart
|
|
- `video_accessibility_development_plan.txt` is the authoritative spec — always read before touching that pipeline
|
|
|
|
## Related
|
|
- [[wiki/tech-patterns/fastapi-python-docker|fastapi-python-docker]] — the API layer above
|
|
- [[wiki/tech-patterns/python-ai-agents|python-ai-agents]] — what the workers execute
|
|
- [[wiki/architecture/gcp-deployment-lb-timeout|gcp-deployment-lb-timeout]] — why polling beats streaming
|