dow-prod-tracker/docker-compose.yml
DJP 13e069d72c Hardening: Prisma pool bump + webhook rate limiting
Two production-readiness fixes called out in the scaling review
for ~40 concurrent users + daily upstream webhook traffic:

1. Prisma connection pool
   DATABASE_URL now carries ?connection_limit=20&pool_timeout=10
   both in docker-compose.yml (prod) and .env.example (local).
   Default is cpus*2+1 (~5-9 inside a container) which can exhaust
   at peak when mutations + TanStack Query polling coincide.
   Postgres max_connections is 100 so 20 × a couple of app replicas
   leaves headroom.

2. Webhook rate limiter
   New src/lib/webhooks/rate-limit.ts — in-memory sliding-window
   limiter keyed on "<scope>:<ip>". 100 req/min per IP per webhook
   (omg / deliverables / briefs). Applied to all three POST
   handlers; over the limit returns 429 with a Retry-After header.
   Dev-mode bypass honours the matching *_WEBHOOK_ALLOW_INSECURE
   env so stub testing isn't throttled.

   Single-process only — swap the Map for Redis if we scale to
   multiple Next.js instances. Single-instance dow-prod-tracker on
   optical-dev is the target today, so in-memory is sufficient.

Also updated INTEGRATION.md with a rate-limiting section so the
upstream integrator knows what to expect + how to handle 429s.
2026-04-21 17:01:19 -04:00

87 lines
3.8 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

name: dow-prod-tracker
services:
# ─── PostgreSQL with pgvector ───────────────────────────
db:
image: pgvector/pgvector:pg17
restart: unless-stopped
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: ${DB_PASSWORD:-postgres}
POSTGRES_DB: dow_prod_tracker
# Host port is overridable via DB_HOST_PORT env var — deploy.sh auto-picks
# a free one if 5492 is taken on the host. The container-internal port
# (5432) never changes — the app connects to db:5432 over the Docker
# network and doesn't care what host port (if any) is mapped.
ports:
- "${DB_HOST_PORT:-5492}:5432"
volumes:
- pgdata:/var/lib/postgresql/data
- ./docker/db-init.sql:/docker-entrypoint-initdb.d/01-pgvector.sql:ro
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
# ─── Next.js app ───────────────────────────────────────
app:
build:
context: .
dockerfile: Dockerfile
restart: unless-stopped
# Host port is overridable via APP_HOST_PORT env var — deploy.sh auto-picks
# a free one if 3002 is taken, and writes the chosen port into the Apache
# reverse-proxy config (apache/dow-prod-tracker.conf) at the same time.
ports:
- "${APP_HOST_PORT:-3002}:3000"
environment:
# DATABASE_URL tuning knobs matter at ~40 concurrent users:
# connection_limit — how many pooled connections Prisma will
# open per app instance. Default is cpus*2+1 (~5-9 inside a
# container), which can run out at peak when mutations + query
# polling coincide. 20 gives plenty of headroom for this scale.
# pool_timeout — seconds to wait for a free connection before
# failing the request (default 10). 10s matches the default
# and stays explicit.
# Postgres side: default max_connections is 100, so 20 × a few
# app replicas is well below the ceiling.
DATABASE_URL: postgresql://postgres:${DB_PASSWORD:-postgres}@db:5432/dow_prod_tracker?schema=public&connection_limit=20&pool_timeout=10
# Ollama — points to internal GPU server for embeddings + chat fallback
OLLAMA_HOST: ${OLLAMA_HOST:-http://10.24.42.219:11434}
OLLAMA_CHAT_HOST: ${OLLAMA_CHAT_HOST:-http://10.24.42.219:11434}
OLLAMA_CHAT_MODEL: ${OLLAMA_CHAT_MODEL:-gemma4:latest}
OLLAMA_EMBED_MODEL: ${OLLAMA_EMBED_MODEL:-nomic-embed-text}
NODE_ENV: production
AUTH_SECRET: ${AUTH_SECRET}
AUTH_TRUST_HOST: "true"
# Azure SPA registration — PKCE in browser, no client secret
AZURE_CLIENT_ID: ${AZURE_CLIENT_ID}
AZURE_TENANT_ID: ${AZURE_TENANT_ID}
AZURE_REDIRECT_URI: ${AZURE_REDIRECT_URI:-}
CRON_SECRET: ${CRON_SECRET:-change-me}
API_KEY: ${API_KEY:-}
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-}
ANTHROPIC_MODEL: ${ANTHROPIC_MODEL:-}
DEV_BYPASS_AUTH: ${DEV_BYPASS_AUTH:-false}
DEV_USER_ID: ${DEV_USER_ID:-}
# OMG webhook (Shashank pending — stub until payload confirmed)
OMG_WEBHOOK_SECRET: ${OMG_WEBHOOK_SECRET:-}
OMG_WEBHOOK_ALLOW_INSECURE: ${OMG_WEBHOOK_ALLOW_INSECURE:-false}
# Auth: Entra SSO stays coded but gated. Flip to "true" post-MVP once redirect URI is live.
NEXT_PUBLIC_AUTH_ENTRA_ENABLED: ${NEXT_PUBLIC_AUTH_ENTRA_ENABLED:-false}
volumes:
- uploads_data:/data/uploads
depends_on:
db:
condition: service_healthy
healthcheck:
test: ["CMD-SHELL", "wget -q --spider http://localhost:3000/api/health || exit 1"]
interval: 15s
timeout: 5s
retries: 3
start_period: 30s
volumes:
pgdata:
uploads_data: