obsidian/wiki/tech-patterns/websocket-keepalive-terminal-close.md
2026-05-01 12:15:13 +01:00

3 KiB
Raw Blame History

title description tags created updated projects
WebSocket Keepalive + Terminal Close Codes Bidirectional 20s keepalive pattern + terminal close code list to prevent reconnect storms through Apache mod_proxy_wstunnel
websocket
apache
react
fastapi
reliability
2026-05-01 2026-05-01
video-accessibility
mod-comms

WebSocket Keepalive + Terminal Close Codes

Problem

Apache mod_proxy_wstunnel (and GCP HTTP LB) drop idle WebSocket connections. If the client auto-reconnects on every close code, a permanent permission failure causes an infinite reconnect storm: battery drain + auth rate-limit exhaustion.

Mod Comms incident (2026-03-18): heartbeat at 25s was not safe through Apache — every second ping raced the proxy idle timer. 30s (the next candidate) missed the timer entirely.

Solution: two parts

1. Bidirectional 20s keepalive

Client (React): send a ping frame every 20 000 ms.

// useJobStatusWebSocket.ts
const HEARTBEAT_MS = 20_000; // was 30_000 — lowered after Mod Comms incident

heartbeatIntervalRef.current = setInterval(() => {
  if (wsRef.current?.readyState === WebSocket.OPEN) {
    wsRef.current.send('ping');
  }
}, HEARTBEAT_MS);

Server (FastAPI): if no client message arrives for 20s, emit a keepalive frame.

# routes_websockets.py
while True:
    try:
        msg = await asyncio.wait_for(websocket.receive_text(), timeout=20.0)
        if msg == "ping":
            await websocket.send_json({"type": "pong"})
    except asyncio.TimeoutError:
        await websocket.send_json({"type": "keepalive"})

Both sides generate traffic every ≤20s → Apache idle timer never fires.

2. Terminal close codes — no reconnect

const TERMINAL_CLOSE_CODES = new Set([
  1000,  // normal closure
  4001,  // unauthenticated
  4003,  // forbidden
  4004,  // not found
  4403,  // org access denied (cross-tenant)
]);

// In handleClose:
const isTerminal = TERMINAL_CLOSE_CODES.has(event.code);
if (!isTerminal) {
  scheduleReconnect();
}
if (isTerminal && event.code !== 1000 && onTerminalClose) {
  onTerminalClose(event.code, event.reason); // surface via toast
}

Apache config requirement

ProxyTimeout must be ≥ 60s on the Apache vhost (≥ 3× the heartbeat period for safety margin).

ProxyTimeout 600   # video-accessibility uses 600 for large uploads — also covers WS

Minimum for WS-only deployments: ProxyTimeout 60.

Checklist before merging WS changes

  • Client heartbeat ≤ 25s (use 20s)
  • Server keepalive frame on idle (≤ 20s)
  • TERMINAL_CLOSE_CODES covers 1000, 4001, 4003, 4004, 4403
  • Token-null guard: if (!accessToken) return + accessToken in dep array
  • Both /ws/jobs/{id} AND /ws/jobs (list) get org filtering
  • heartbeatIntervalRef cleared on close (no interval leak)
  • Toast surfaced on terminal close (not silent disconnect)

Projects using this pattern

  • video-accessibilityuseJobStatusWebSocket.ts, routes_websockets.py
  • mod-comms — triggered the original 2026-03-18 incident