3 KiB
| title | description | tags | created | updated | projects | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| WebSocket Keepalive + Terminal Close Codes | Bidirectional 20s keepalive pattern + terminal close code list to prevent reconnect storms through Apache mod_proxy_wstunnel |
|
2026-05-01 | 2026-05-01 |
|
WebSocket Keepalive + Terminal Close Codes
Problem
Apache mod_proxy_wstunnel (and GCP HTTP LB) drop idle WebSocket connections. If the client auto-reconnects on every close code, a permanent permission failure causes an infinite reconnect storm: battery drain + auth rate-limit exhaustion.
Mod Comms incident (2026-03-18): heartbeat at 25s was not safe through Apache — every second ping raced the proxy idle timer. 30s (the next candidate) missed the timer entirely.
Solution: two parts
1. Bidirectional 20s keepalive
Client (React): send a ping frame every 20 000 ms.
// useJobStatusWebSocket.ts
const HEARTBEAT_MS = 20_000; // was 30_000 — lowered after Mod Comms incident
heartbeatIntervalRef.current = setInterval(() => {
if (wsRef.current?.readyState === WebSocket.OPEN) {
wsRef.current.send('ping');
}
}, HEARTBEAT_MS);
Server (FastAPI): if no client message arrives for 20s, emit a keepalive frame.
# routes_websockets.py
while True:
try:
msg = await asyncio.wait_for(websocket.receive_text(), timeout=20.0)
if msg == "ping":
await websocket.send_json({"type": "pong"})
except asyncio.TimeoutError:
await websocket.send_json({"type": "keepalive"})
Both sides generate traffic every ≤20s → Apache idle timer never fires.
2. Terminal close codes — no reconnect
const TERMINAL_CLOSE_CODES = new Set([
1000, // normal closure
4001, // unauthenticated
4003, // forbidden
4004, // not found
4403, // org access denied (cross-tenant)
]);
// In handleClose:
const isTerminal = TERMINAL_CLOSE_CODES.has(event.code);
if (!isTerminal) {
scheduleReconnect();
}
if (isTerminal && event.code !== 1000 && onTerminalClose) {
onTerminalClose(event.code, event.reason); // surface via toast
}
Apache config requirement
ProxyTimeout must be ≥ 60s on the Apache vhost (≥ 3× the heartbeat period for safety margin).
ProxyTimeout 600 # video-accessibility uses 600 for large uploads — also covers WS
Minimum for WS-only deployments: ProxyTimeout 60.
Checklist before merging WS changes
- Client heartbeat ≤ 25s (use 20s)
- Server keepalive frame on idle (≤ 20s)
TERMINAL_CLOSE_CODEScovers 1000, 4001, 4003, 4004, 4403- Token-null guard:
if (!accessToken) return+accessTokenin dep array - Both
/ws/jobs/{id}AND/ws/jobs(list) get org filtering heartbeatIntervalRefcleared on close (no interval leak)- Toast surfaced on terminal close (not silent disconnect)
Projects using this pattern
- video-accessibility —
useJobStatusWebSocket.ts,routes_websockets.py - mod-comms — triggered the original 2026-03-18 incident