obsidian/wiki/tech-patterns/websocket-keepalive-terminal-close.md
2026-05-01 12:15:13 +01:00

94 lines
3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "WebSocket Keepalive + Terminal Close Codes"
description: "Bidirectional 20s keepalive pattern + terminal close code list to prevent reconnect storms through Apache mod_proxy_wstunnel"
tags: [websocket, apache, react, fastapi, reliability]
created: 2026-05-01
updated: 2026-05-01
projects: [video-accessibility, mod-comms]
---
# WebSocket Keepalive + Terminal Close Codes
## Problem
Apache `mod_proxy_wstunnel` (and GCP HTTP LB) drop idle WebSocket connections. If the client auto-reconnects on every close code, a permanent permission failure causes an infinite reconnect storm: battery drain + auth rate-limit exhaustion.
**Mod Comms incident (2026-03-18):** heartbeat at 25s was not safe through Apache — every second ping raced the proxy idle timer. 30s (the next candidate) missed the timer entirely.
## Solution: two parts
### 1. Bidirectional 20s keepalive
**Client (React):** send a ping frame every 20 000 ms.
```ts
// useJobStatusWebSocket.ts
const HEARTBEAT_MS = 20_000; // was 30_000 — lowered after Mod Comms incident
heartbeatIntervalRef.current = setInterval(() => {
if (wsRef.current?.readyState === WebSocket.OPEN) {
wsRef.current.send('ping');
}
}, HEARTBEAT_MS);
```
**Server (FastAPI):** if no client message arrives for 20s, emit a `keepalive` frame.
```python
# routes_websockets.py
while True:
try:
msg = await asyncio.wait_for(websocket.receive_text(), timeout=20.0)
if msg == "ping":
await websocket.send_json({"type": "pong"})
except asyncio.TimeoutError:
await websocket.send_json({"type": "keepalive"})
```
Both sides generate traffic every ≤20s → Apache idle timer never fires.
### 2. Terminal close codes — no reconnect
```ts
const TERMINAL_CLOSE_CODES = new Set([
1000, // normal closure
4001, // unauthenticated
4003, // forbidden
4004, // not found
4403, // org access denied (cross-tenant)
]);
// In handleClose:
const isTerminal = TERMINAL_CLOSE_CODES.has(event.code);
if (!isTerminal) {
scheduleReconnect();
}
if (isTerminal && event.code !== 1000 && onTerminalClose) {
onTerminalClose(event.code, event.reason); // surface via toast
}
```
## Apache config requirement
`ProxyTimeout` must be ≥ 60s on the Apache vhost (≥ 3× the heartbeat period for safety margin).
```apache
ProxyTimeout 600 # video-accessibility uses 600 for large uploads — also covers WS
```
Minimum for WS-only deployments: `ProxyTimeout 60`.
## Checklist before merging WS changes
- [ ] Client heartbeat ≤ 25s (use 20s)
- [ ] Server keepalive frame on idle (≤ 20s)
- [ ] `TERMINAL_CLOSE_CODES` covers 1000, 4001, 4003, 4004, 4403
- [ ] Token-null guard: `if (!accessToken) return` + `accessToken` in dep array
- [ ] Both `/ws/jobs/{id}` AND `/ws/jobs` (list) get org filtering
- [ ] `heartbeatIntervalRef` cleared on close (no interval leak)
- [ ] Toast surfaced on terminal close (not silent disconnect)
## Projects using this pattern
- **video-accessibility** — `useJobStatusWebSocket.ts`, `routes_websockets.py`
- **mod-comms** — triggered the original 2026-03-18 incident