6.8 KiB
| title | description | tags | created | updated | |||
|---|---|---|---|---|---|---|---|
| Architecture Decision Records (ADR) | Why specific tech choices were made at Oliver Agency — prevents relitigating decisions and documents constraints |
|
2026-04-27 | 2026-04-27 |
Architecture Decision Records
Decisions made and why. Prevents relitigating the same choices. Each record: decision, context, alternatives considered, rationale.
Key Takeaways
- Most Oliver stack choices are driven by server constraints (GCP 30s LB timeout) and team familiarity
- Docker Compose is deliberately chosen over k8s for operational simplicity at this scale
- FastAPI over Django/Flask: async performance + auto-generated OpenAPI docs are worth the smaller ecosystem
- HTTP polling over WebSockets is a hard constraint, not a preference
ADR-001: HTTP Polling over WebSockets
Date: 2026-03 (from Mod Comms incident) Status: Active — applies to ALL Oliver projects
Decision: Never use WebSockets for long-running task communication. Use HTTP polling with a job table.
Context: Mod Comms was deployed on GCP behind a load balancer. WebSocket connections were dropped after exactly 30 seconds. The LB timeout is not configurable without GCP support escalation.
Pattern:
POST /api/jobs → {job_id}
GET /api/jobs/{id} → {status: pending|done, result?}
Frontend polls every 2s
Applies to: All projects on optical-dev (Apache) and GCP. optical-web-1 (direct systemd) is less affected but polling is still safer.
See wiki/architecture/gcp-deployment-lb-timeout.
ADR-002: Docker Compose over Kubernetes
Date: ~2025 Status: Active
Decision: Single-server Docker Compose for all Oliver project deployments.
Context: Oliver Agency projects are internal tools and client portals, not public-scale services. Each project runs on one server with 1–3 services.
Alternatives: k8s (Minikube, GKE), Docker Swarm, bare systemd.
Rationale:
- k8s adds ~3 days of ops overhead per project for no benefit at this scale
- Docker Compose is understood by entire team
- Rollbacks:
docker compose up -dwith previous image tag - optical-dev already runs 15+ Compose projects without issues
Exceptions: Hotfolder daemons on box-cli-01 use plain systemd (CentOS 7, no Docker).
ADR-003: FastAPI over Django/Flask
Date: ~2024 Status: Active
Decision: FastAPI as the default Python backend framework.
Rationale:
- Async-first: handles concurrent AI API calls without blocking
- Auto-generated OpenAPI docs (
/docs) — zero effort API documentation - Pydantic models: input validation + serialization in one place
- Performance: competitive with Node.js for I/O-bound workloads
- Type hints throughout → fewer runtime errors
When to deviate:
- Admin CRUD with lots of forms → Django (but Oliver doesn't have these)
- Very simple one-endpoint proxy → Flask is fine
ADR-004: React + Vite over Vue / Angular / SvelteKit
Date: ~2024 Status: Active
Decision: React 18 + Vite as the standard frontend stack.
Rationale:
- Team familiar with React; no training cost
- Vite: fast HMR, simple
baseconfig for subpath deploys - React ecosystem: Shadcn/UI, Zustand, React Query all solid
- TypeScript + Vite: first-class support
When to deviate:
- No interactivity needed → plain HTML/JS (3M Portal, Ferrero AC Tool)
- Next.js needed → when SSR, image optimization, or complex routing required
ADR-005: Azure AD / MSAL as Auth Standard
Date: ~2024 Status: Active
Decision: Azure AD SSO for all Oliver internal authenticated tools.
Context: Oliver Agency has a Microsoft 365 tenant. All employees have Azure AD accounts.
Pattern: MSAL.js PKCE in frontend (delegated flow) + JWKS token validation in FastAPI backend.
Local dev bypass: DISABLE_AUTH=true env var skips auth middleware. Never in production.
Alternatives: Auth0 (cost, external dependency), custom JWT (reinventing the wheel), Keycloak (infra overhead).
See wiki/tech-patterns/azure-ad-msal-auth.
ADR-006: Cost Tracker on Every AI Project
Date: 2026-04 (ai-cost-tracker launch) Status: Active
Decision: Every Oliver project making AI API calls must integrate ai-cost-tracker with preflight + record.
Context: AI API costs (Gemini, Claude, OpenAI) can spike unpredictably. Without tracking, budget overruns only discovered on monthly bill.
Integration cost: ~30 minutes per project (3 env vars + 2 HTTP calls).
Enforcement: preflight() returns allow: false if budget exceeded — prevents runaway costs.
See wiki/tech-patterns/cost-tracker-integration.
ADR-007: Apache Single-Vhost Subpath Pattern on optical-dev
Date: 2026-04 (documented from Barclays Banner Builder) Status: Active
Decision: All projects on optical-dev share one Apache vhost. Each project gets a subpath (/project-name/), not a subdomain.
Context: optical-dev has one public IP. Subdomain-per-project requires DNS management and SSL certificates. Subpath requires only Apache config fragments.
Constraints:
- React apps must use
VITE_BASE_PATHand React Routerbasename - All API calls must include the subpath prefix
- Include directive order matters — specific paths before catch-alls
See wiki/architecture/optical-dev-server-deploy.
ADR-008: Gemini over GPT for Barclays / GCP Projects
Date: 2026-03 (Mod Comms) Status: Active for GCP-deployed projects
Decision: Prefer Google Gemini as AI provider for projects deployed on GCP.
Rationale: Google-to-Google latency advantage. GCP service account auth is simpler than API key rotation. Gemini Pro + Flash fallback gives cost/quality control.
When to use Claude/OpenAI instead: Client specifies it (PIMCO uses Claude API), or task requires better coding ability, or project is on optical-web-1 / optical-dev (neutral infrastructure).
ADR-009: Node.js Proxy for One2Edit / Simple Portals
Date: ~2024 Status: Active
Decision: Use Node.js + vanilla JS (no framework, no build step) for simple CORS proxy portals.
Context: One2Edit API doesn't support CORS. H&M and 3M portals need to proxy requests to oliver.one2edit.com.
Rationale: No build pipeline = easier to deploy and debug. Vanilla JS works fine for 3-page portals. Node.js express proxy is 30 lines.
Pattern: Static files served by Node + /api/* proxied to external API. See wiki/tech-patterns/nodejs-vanilla-proxy.