Adaptive Rate Limit Backoff with API Retry-After Headers

When to use

When integrating with third-party APIs that enforce rate limits (429 responses) and return dynamic retry delays, especially when multiple concurrent jobs can exhaust limits unexpectedly. Use this pattern to avoid retry storms and respect API-mandated wait times.

Prerequisites

Celery or similar async task queue for job retry management
asyncio for concurrent request handling
API that returns retry-after headers or retryDelay fields in responses
Rate limiting already in place but insufficient for concurrent workloads

Steps

Parse the rate limit response (429 status code) to extract retry delay information:
- Check HTTP Retry-After header (standard)
- Parse API-specific JSON fields like retryDelay: "37s" or text patterns like "retry in 37s"
Extract numeric delay value and convert to seconds
Use extracted delay as countdown for Celery task retry:
```
task.retry(countdown=retry_delay_seconds)
```
For asyncio-based retries, implement exponential backoff using the API-provided delay:
```
await asyncio.sleep(retry_delay_seconds)
await retry_operation()
```
Log the parsed delay to track rate limit events and verify API compliance

Key Configuration

Static backoff (old): 1-3 seconds — insufficient for APIs requiring 30+ second waits
Dynamic backoff (new): Parse retryDelay from API response and use as source of truth
Concurrency control: Monitor parallel job count; if tests or increased traffic trigger 8+ concurrent TTS requests against a 10 RPM limit, rate limiting will be hit immediately
Retry limit: Set reasonable max retries to prevent infinite loops (e.g., 3-5 attempts)

Gotchas

Hardcoded backoff too short: Fixed 1-3 second delays will immediately fail against 429 responses requiring 30+ seconds; always parse and respect API-provided delays
Ignoring Retry-After header: Retrying before the required delay causes immediate re-throttling and cascading failures
Concurrency spike: Tests or batch operations can suddenly exceed rate limits previously considered safe; monitor actual concurrent request rates vs. API limits
Pattern parsing: Different APIs format retry delays differently — implement flexible parsing for both header format ("37") and JSON format ("retryDelay: '37s'")
Task queue vs asyncio mismatch: Use Celery countdown for Celery tasks; use asyncio.sleep() for async operations — don't mix patterns

Source

Project: video-accessibility

2.6 KiB Raw Blame History