94 lines
3 KiB
Markdown
94 lines
3 KiB
Markdown
---
|
||
title: "WebSocket Keepalive + Terminal Close Codes"
|
||
description: "Bidirectional 20s keepalive pattern + terminal close code list to prevent reconnect storms through Apache mod_proxy_wstunnel"
|
||
tags: [websocket, apache, react, fastapi, reliability]
|
||
created: 2026-05-01
|
||
updated: 2026-05-01
|
||
projects: [video-accessibility, mod-comms]
|
||
---
|
||
|
||
# WebSocket Keepalive + Terminal Close Codes
|
||
|
||
## Problem
|
||
|
||
Apache `mod_proxy_wstunnel` (and GCP HTTP LB) drop idle WebSocket connections. If the client auto-reconnects on every close code, a permanent permission failure causes an infinite reconnect storm: battery drain + auth rate-limit exhaustion.
|
||
|
||
**Mod Comms incident (2026-03-18):** heartbeat at 25s was not safe through Apache — every second ping raced the proxy idle timer. 30s (the next candidate) missed the timer entirely.
|
||
|
||
## Solution: two parts
|
||
|
||
### 1. Bidirectional 20s keepalive
|
||
|
||
**Client (React):** send a ping frame every 20 000 ms.
|
||
|
||
```ts
|
||
// useJobStatusWebSocket.ts
|
||
const HEARTBEAT_MS = 20_000; // was 30_000 — lowered after Mod Comms incident
|
||
|
||
heartbeatIntervalRef.current = setInterval(() => {
|
||
if (wsRef.current?.readyState === WebSocket.OPEN) {
|
||
wsRef.current.send('ping');
|
||
}
|
||
}, HEARTBEAT_MS);
|
||
```
|
||
|
||
**Server (FastAPI):** if no client message arrives for 20s, emit a `keepalive` frame.
|
||
|
||
```python
|
||
# routes_websockets.py
|
||
while True:
|
||
try:
|
||
msg = await asyncio.wait_for(websocket.receive_text(), timeout=20.0)
|
||
if msg == "ping":
|
||
await websocket.send_json({"type": "pong"})
|
||
except asyncio.TimeoutError:
|
||
await websocket.send_json({"type": "keepalive"})
|
||
```
|
||
|
||
Both sides generate traffic every ≤20s → Apache idle timer never fires.
|
||
|
||
### 2. Terminal close codes — no reconnect
|
||
|
||
```ts
|
||
const TERMINAL_CLOSE_CODES = new Set([
|
||
1000, // normal closure
|
||
4001, // unauthenticated
|
||
4003, // forbidden
|
||
4004, // not found
|
||
4403, // org access denied (cross-tenant)
|
||
]);
|
||
|
||
// In handleClose:
|
||
const isTerminal = TERMINAL_CLOSE_CODES.has(event.code);
|
||
if (!isTerminal) {
|
||
scheduleReconnect();
|
||
}
|
||
if (isTerminal && event.code !== 1000 && onTerminalClose) {
|
||
onTerminalClose(event.code, event.reason); // surface via toast
|
||
}
|
||
```
|
||
|
||
## Apache config requirement
|
||
|
||
`ProxyTimeout` must be ≥ 60s on the Apache vhost (≥ 3× the heartbeat period for safety margin).
|
||
|
||
```apache
|
||
ProxyTimeout 600 # video-accessibility uses 600 for large uploads — also covers WS
|
||
```
|
||
|
||
Minimum for WS-only deployments: `ProxyTimeout 60`.
|
||
|
||
## Checklist before merging WS changes
|
||
|
||
- [ ] Client heartbeat ≤ 25s (use 20s)
|
||
- [ ] Server keepalive frame on idle (≤ 20s)
|
||
- [ ] `TERMINAL_CLOSE_CODES` covers 1000, 4001, 4003, 4004, 4403
|
||
- [ ] Token-null guard: `if (!accessToken) return` + `accessToken` in dep array
|
||
- [ ] Both `/ws/jobs/{id}` AND `/ws/jobs` (list) get org filtering
|
||
- [ ] `heartbeatIntervalRef` cleared on close (no interval leak)
|
||
- [ ] Toast surfaced on terminal close (not silent disconnect)
|
||
|
||
## Projects using this pattern
|
||
|
||
- **video-accessibility** — `useJobStatusWebSocket.ts`, `routes_websockets.py`
|
||
- **mod-comms** — triggered the original 2026-03-18 incident
|