--- title: "WebSocket Keepalive + Terminal Close Codes" description: "Bidirectional 20s keepalive pattern + terminal close code list to prevent reconnect storms through Apache mod_proxy_wstunnel" tags: [websocket, apache, react, fastapi, reliability] created: 2026-05-01 updated: 2026-05-01 projects: [video-accessibility, mod-comms] --- # WebSocket Keepalive + Terminal Close Codes ## Problem Apache `mod_proxy_wstunnel` (and GCP HTTP LB) drop idle WebSocket connections. If the client auto-reconnects on every close code, a permanent permission failure causes an infinite reconnect storm: battery drain + auth rate-limit exhaustion. **Mod Comms incident (2026-03-18):** heartbeat at 25s was not safe through Apache — every second ping raced the proxy idle timer. 30s (the next candidate) missed the timer entirely. ## Solution: two parts ### 1. Bidirectional 20s keepalive **Client (React):** send a ping frame every 20 000 ms. ```ts // useJobStatusWebSocket.ts const HEARTBEAT_MS = 20_000; // was 30_000 — lowered after Mod Comms incident heartbeatIntervalRef.current = setInterval(() => { if (wsRef.current?.readyState === WebSocket.OPEN) { wsRef.current.send('ping'); } }, HEARTBEAT_MS); ``` **Server (FastAPI):** if no client message arrives for 20s, emit a `keepalive` frame. ```python # routes_websockets.py while True: try: msg = await asyncio.wait_for(websocket.receive_text(), timeout=20.0) if msg == "ping": await websocket.send_json({"type": "pong"}) except asyncio.TimeoutError: await websocket.send_json({"type": "keepalive"}) ``` Both sides generate traffic every ≤20s → Apache idle timer never fires. ### 2. Terminal close codes — no reconnect ```ts const TERMINAL_CLOSE_CODES = new Set([ 1000, // normal closure 4001, // unauthenticated 4003, // forbidden 4004, // not found 4403, // org access denied (cross-tenant) ]); // In handleClose: const isTerminal = TERMINAL_CLOSE_CODES.has(event.code); if (!isTerminal) { scheduleReconnect(); } if (isTerminal && event.code !== 1000 && onTerminalClose) { onTerminalClose(event.code, event.reason); // surface via toast } ``` ## Apache config requirement `ProxyTimeout` must be ≥ 60s on the Apache vhost (≥ 3× the heartbeat period for safety margin). ```apache ProxyTimeout 600 # video-accessibility uses 600 for large uploads — also covers WS ``` Minimum for WS-only deployments: `ProxyTimeout 60`. ## Checklist before merging WS changes - [ ] Client heartbeat ≤ 25s (use 20s) - [ ] Server keepalive frame on idle (≤ 20s) - [ ] `TERMINAL_CLOSE_CODES` covers 1000, 4001, 4003, 4004, 4403 - [ ] Token-null guard: `if (!accessToken) return` + `accessToken` in dep array - [ ] Both `/ws/jobs/{id}` AND `/ws/jobs` (list) get org filtering - [ ] `heartbeatIntervalRef` cleared on close (no interval leak) - [ ] Toast surfaced on terminal close (not silent disconnect) ## Projects using this pattern - **video-accessibility** — `useJobStatusWebSocket.ts`, `routes_websockets.py` - **mod-comms** — triggered the original 2026-03-18 incident