Backend (Phase A): - A1: Adaptive silence buffer — natural_gap_ms persisted per cue; renderer computes per-cue silence_before/silence_after instead of fixed 500ms; per-cue silence files - A2: Forward-preferred snap — snap_pause_point prefers boundaries up to 4s ahead over boundaries within 1.5s behind, reducing mid-scene cuts - A3: Min-gap validation — pause points with < 200ms gap trigger forward search to the next acceptable gap - natural_gap_ms added to PausePointData model and api.ts type - New config fields: whisper_snap_forward_window, whisper_snap_backward_window, ad_silence_buffer_default, ad_silence_buffer_min_after, ad_min_acceptable_gap - Tests: test_whisper_snap.py (13 tests), test_video_renderer_buffers.py Frontend (Phase B): - B1: Drag pause-point markers — pointer state machine with 3px move threshold, clamp to min/max bounds, click-without-move still opens PausePointEditor - B2: Drag freeze blocks — orange blocks translate with linked pause point - B3: Time tooltip visible during drag, hidden on release - Tests: TimelinePreview.drag.test.tsx (10 tests) Fixes: - Share link pointed to ai-sandbox.oliver.solutions — added app_url to Settings with correct optical-dev.oliver.solutions default; share_url now configurable via APP_URL env var - Removed all ai-sandbox.oliver.solutions references from docker-compose, apache config, docs, and scripts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
11 KiB
Runbook — Accessible Video Processing Platform
Generated: 2026-05-01
Quick Navigation
- Docs Hub
- Infrastructure
- Architecture
- Local Dev Setup
- Deployment
- Service Operations
- Troubleshooting
- Environment Variables
Agent Entry
| Signal | Value |
|---|---|
| Purpose | Step-by-step procedures for running, deploying, and troubleshooting the platform |
| Read When | Local setup, deployment, restart, or incident diagnosis |
| Skip When | You need architecture understanding → architecture.md; inventory → infrastructure.md |
| Canonical | Yes |
| Next Docs | Infrastructure, Architecture |
| Primary Sources | scripts/run-local.sh, docker-compose.yml, .env.example |
1. Local Development Setup
Prerequisites
- Docker Desktop (with
docker composev2) - Node.js 20+ and npm
- GCP credentials JSON at
secrets/gcp-credentials.json .env.localfile (copy from.env.example, fill secrets)
Backend (Docker)
# Start all backend services (API, workers, MongoDB, Redis)
./scripts/run-local.sh
# Force image rebuild after code changes
./scripts/run-local.sh --rebuild
# Stop all services
./scripts/run-local.sh --stop
# Restart
./scripts/run-local.sh --restart
The script uses docker-compose.yml + docker-compose.local.yml with .env.local.
After startup:
- API:
http://localhost:8012 - Swagger UI:
http://localhost:8012/docs
Frontend (Vite dev server)
cd frontend
npm install
npm run dev
Frontend runs on http://localhost:5173 by default.
Run Migrations
docker exec -it accessible-video-api python migrate.py
Create Test Users
docker exec -it accessible-video-api python create_test_users.py
2. Deployment (optical-web-1)
RULE: Never SSH into optical-web-1 or run commands on it without explicit user instruction.
Deploy Script
./scripts/deploy-dev.sh
Frontend Build
./scripts/build-frontend.sh
Builds the React SPA and copies dist/ to the nginx serving directory.
Production Environment File
Production uses the .env file on optical-web-1. Key differences from .env.example:
| Variable | Production value |
|---|---|
APP_ENV |
production |
COOKIE_SECURE |
true |
COOKIE_DOMAIN |
optical-dev.oliver.solutions |
| All API keys | Real secret values |
3. Service Operations
View Logs
docker logs accessible-video-api -f --tail=100
docker logs accessible-video-worker -f --tail=100
docker logs accessible-video-tts-worker -f --tail=100
docker logs accessible-video-ffmpeg-worker -f --tail=100
docker logs accessible-video-whisper-worker -f --tail=100
Restart a Single Service
docker compose restart api
docker compose restart worker
docker compose restart tts-worker
docker compose restart ffmpeg-worker
docker compose restart whisper-worker
Restart All Services
docker compose down && docker compose up -d
Rebuild a Single Service
docker compose build api && docker compose up -d api
docker compose build worker && docker compose up -d worker
Check Running Services
docker compose ps
Check Queue Depths
# Via API (requires admin token)
GET /api/v1/production/queue-stats
# Via Redis CLI
docker exec -it accessible-video-redis redis-cli llen celery
4. Troubleshooting
TTS Worker Crash Loop (Memory)
Symptom: tts-worker container restarts; OOM errors in logs.
Cause: TTS_WORKER_CONCURRENCY × per-process memory exceeds available RAM.
Fix: Lower TTS_WORKER_CONCURRENCY in .env (recommended: 2 for 512 MB containers), then:
docker compose stop tts-worker
# edit .env: TTS_WORKER_CONCURRENCY=2
docker compose up -d tts-worker
Whisper Worker OOM
Symptom: whisper-worker killed with exit code 137.
Cause: Whisper large-v3 requires ~4–6 GB RAM; container limit is 8 GB.
Fix: Ensure host has sufficient free RAM, or switch to Cloud Run mode via WHISPER_SERVICE_URL.
Stuck Jobs
Symptom: Job stays in ingesting or ai_processing indefinitely.
Steps:
- Check worker logs for errors
- Admin API:
POST /api/v1/admin/maintenance/reprocess-job/{job_id} - Or:
POST /api/v1/jobs/{job_id}/retry
MongoDB Connection Failure
Symptom: API returns 500; logs show ServerSelectionTimeoutError.
Steps:
docker compose ps— check mongodb container statusdocker logs accessible-video-mongodb --tail=50- Confirm
MONGODB_URIin.envmatches the running container
Redis Connection Failure
Symptom: Celery tasks not executing; redis.exceptions.ConnectionError in logs.
Steps:
docker exec -it accessible-video-redis redis-cli ping— should returnPONGdocker compose restart redisdocker compose restart worker tts-worker ffmpeg-worker whisper-worker
GCS Access Denied
Symptom: 403 Forbidden from GCS; files not uploading.
Steps:
- Verify
secrets/gcp-credentials.jsonexists and is bind-mounted - Confirm service account has
Storage Object AdminonGCS_BUCKET - Check
GCP_PROJECT_IDandGCS_BUCKETin.env
Celery Worker Not Processing Queue
Symptom: Jobs queued but workers idle.
Steps:
docker compose ps— check worker containers running- Check worker logs for import errors at startup
- Verify
CELERY_BROKER_URLresolves to Redis within the compose network
WebSocket Disconnects / Reconnect Storms (optical-web-1)
Symptom: Users experience frequent WebSocket disconnections followed by rapid reconnect attempts visible in browser DevTools Network tab.
Root cause: Apache mod_proxy_wstunnel on optical-web-1 has a ProxyTimeout that drops idle WebSocket connections. The client ping interval (20 s) and server keepalive frame (20 s) are designed to prevent this, but only if Apache's timeout is above 20 s.
Recommended Apache config (verify with DevOps before applying):
# In the VirtualHost block for the API
ProxyTimeout 60
Do not set ProxyTimeout below 30 s. The Mod Comms 2026-03-18 incident showed that 25 s was insufficient through mod_proxy_wstunnel — the idle timer fires on the proxy side before the client ping arrives. 60 s provides a comfortable margin above the 20 s bidirectional keepalive cadence.
Verification after change:
- Open DevTools → Network → WS tab
- Connect to any job and let it sit idle for 2 minutes
- Confirm no
closeframes and no reconnect attempts appear
5. Environment Variables
Copy from .env.example. All variables are required unless marked optional.
| Variable | Default | Required | Description |
|---|---|---|---|
APP_ENV |
dev |
Yes | dev or production |
API_BASE_URL |
— | Yes | Public API base URL |
JWT_SECRET |
— | Yes | Random secret; rotation invalidates all sessions |
JWT_ALG |
HS256 |
No | JWT signing algorithm |
JWT_ACCESS_TTL_MIN |
240 |
No | Access token TTL (minutes) |
JWT_REFRESH_TTL_DAYS |
7 |
No | Refresh token TTL (days) |
COOKIE_DOMAIN |
optical-dev.oliver.solutions |
Yes | Refresh cookie domain |
COOKIE_SECURE |
true |
No | Set false for local HTTP |
COOKIE_SAMESITE |
Lax |
No | |
MONGODB_URI |
— | Yes | MongoDB connection string |
MONGODB_DB |
accessible_video |
No | Database name |
REDIS_URL |
redis://redis:6379/0 |
Yes | |
CELERY_BROKER_URL |
redis://redis:6379/0 |
Yes | Same as REDIS_URL |
CELERY_RESULT_BACKEND |
redis://redis:6379/0 |
Yes | |
GCP_PROJECT_ID |
— | Yes | GCP project ID |
GCS_BUCKET |
accessible-video |
Yes | GCS bucket name |
GOOGLE_APPLICATION_CREDENTIALS |
/secrets/gcp-credentials.json |
Yes | Path to service account JSON |
GEMINI_API_KEY |
— | Yes | Gemini 2.5 Pro API key |
TRANSLATE_API_KEY |
— | No | Google Translate API key |
ELEVENLABS_API_KEY |
— | No | ElevenLabs API key |
GOOGLE_TTS_CREDENTIALS |
/secrets/gcp-credentials.json |
No | Separate TTS credentials if needed |
SENDGRID_API_KEY |
— | No | SendGrid API key |
EMAIL_FROM |
noreply@optical-dev.oliver.solutions |
No | Sender address |
CLIENT_BASE_URL |
— | No | Frontend URL for email links |
AZURE_CLIENT_ID |
— | No | Microsoft SSO client ID |
AZURE_AUTHORITY |
— | No | Microsoft tenant authority URL |
AZURE_REDIRECT_URI |
— | No | Microsoft OIDC redirect URI |
CORS_ORIGINS |
localhost variants | Yes | Comma-separated allowed origins |
SENTRY_DSN |
— | No | Sentry DSN |
OTEL_EXPORTER_OTLP_ENDPOINT |
— | No | OpenTelemetry collector endpoint |
COST_TRACKER_BASE_URL |
— | No | AI cost tracker API URL |
COST_TRACKER_API_KEY |
— | No | AI cost tracker API key |
COST_TRACKER_SOURCE_APP |
video-accessibility |
No | App identifier |
COST_TRACKER_ENABLED |
true |
No | Enable/disable cost tracking |
WORKER_CONCURRENCY |
8 |
No | General worker concurrency |
TTS_WORKER_CONCURRENCY |
2 |
No | TTS worker concurrency |
FFMPEG_WORKER_CONCURRENCY |
1 |
No | FFmpeg worker concurrency |
WHISPER_WORKER_CONCURRENCY |
1 |
No | Whisper worker concurrency |
FFMPEG_SERVICE_URL |
— | No | Cloud Run FFmpeg service URL |
WHISPER_SERVICE_URL |
— | No | Cloud Run Whisper service URL |
WHISPER_MODEL |
medium |
No | Whisper model size |
USE_CELERY_FALLBACK |
false |
No | Force local Celery instead of Cloud Run |
6. Rollback
Code Rollback
Check out the previous commit and rebuild:
git log --oneline -10
git checkout <previous-commit>
docker compose build && docker compose up -d
JWT Secret Rotation
- Generate:
openssl rand -hex 32 - Update
JWT_SECRETin.env docker compose restart api- All existing sessions are invalidated — users must re-login
Maintenance
Last Updated: 2026-05-01
Update Triggers:
- New script added to
scripts/ - Deployment target changes
- New environment variable required
- New Docker service added
Verification:
./scripts/run-local.shflags match actual script- Environment variable table complete vs
.env.example - Worker env var names match
docker-compose.yml - Troubleshooting container names match compose service names