obsidian/wiki/tech-patterns/adaptive-rate-limit-backoff-strategy.md
2026-05-08 17:09:20 +01:00

2.6 KiB

tags source created
tech-patterns
auto-generated
video-accessibility 2026-05-08

Adaptive Rate Limit Backoff with API Retry-After Headers

When to use

When integrating with third-party APIs that enforce rate limits (429 responses) and return dynamic retry delays, especially when multiple concurrent jobs can exhaust limits unexpectedly. Use this pattern to avoid retry storms and respect API-mandated wait times.

Prerequisites

  • Celery or similar async task queue for job retry management
  • asyncio for concurrent request handling
  • API that returns retry-after headers or retryDelay fields in responses
  • Rate limiting already in place but insufficient for concurrent workloads

Steps

  1. Parse the rate limit response (429 status code) to extract retry delay information:
    • Check HTTP Retry-After header (standard)
    • Parse API-specific JSON fields like retryDelay: "37s" or text patterns like "retry in 37s"
  2. Extract numeric delay value and convert to seconds
  3. Use extracted delay as countdown for Celery task retry:
    task.retry(countdown=retry_delay_seconds)
    
  4. For asyncio-based retries, implement exponential backoff using the API-provided delay:
    await asyncio.sleep(retry_delay_seconds)
    await retry_operation()
    
  5. Log the parsed delay to track rate limit events and verify API compliance

Key Configuration

  • Static backoff (old): 1-3 seconds — insufficient for APIs requiring 30+ second waits
  • Dynamic backoff (new): Parse retryDelay from API response and use as source of truth
  • Concurrency control: Monitor parallel job count; if tests or increased traffic trigger 8+ concurrent TTS requests against a 10 RPM limit, rate limiting will be hit immediately
  • Retry limit: Set reasonable max retries to prevent infinite loops (e.g., 3-5 attempts)

Gotchas

  • Hardcoded backoff too short: Fixed 1-3 second delays will immediately fail against 429 responses requiring 30+ seconds; always parse and respect API-provided delays
  • Ignoring Retry-After header: Retrying before the required delay causes immediate re-throttling and cascading failures
  • Concurrency spike: Tests or batch operations can suddenly exceed rate limits previously considered safe; monitor actual concurrent request rates vs. API limits
  • Pattern parsing: Different APIs format retry delays differently — implement flexible parsing for both header format ("37") and JSON format ("retryDelay: '37s'")
  • Task queue vs asyncio mismatch: Use Celery countdown for Celery tasks; use asyncio.sleep() for async operations — don't mix patterns

Source

Project: video-accessibility