vault backup: 2026-04-27 18:22:09

This commit is contained in:
Vadym Samoilenko 2026-04-27 18:22:09 +01:00
parent 9c3b6fdb7a
commit 401c22d9ff
18 changed files with 1387 additions and 13 deletions

View file

@ -1,27 +1,34 @@
---
name: "Barclays Banner Builder"
client: "TBD"
client: Barclays
status: active
server: optical-dev
tech: []
tech: [React, TypeScript, Vite, Zustand, FastAPI, Python, PostgreSQL, Alembic, Docker]
local_path: /Users/ai_leed/Documents/Projects/Oliver/Barclays-banner-builder
deploy:
url:
deploy: bash deploy.sh
url: https://optical-dev.oliver.solutions/barclays-banner-builder/
tags:
- project
- client/barclays
- domain/ai
created: 2026-04-17
---
## Overview
> New project — fill in during first session.
AI-assisted banner generation tool for Barclays marketing assets. Workflow: Brief → Edit Variants → Banner Editor → Export (CSV/PDF). Uses Zustand journey store for workflow state — backward navigation allowed, forward steps grayed until completed.
## Tech Stack
- **Frontend:**
- **Backend:**
- **Infrastructure:**
- **Frontend:** React 18 + TypeScript + Vite + Zustand (journey store, Barclays design tokens)
- **Backend:** Python + FastAPI + PostgreSQL + Alembic migrations
- **Auth:** Azure AD (MSAL)
- **Infrastructure:** Docker Compose + Apache subpath on optical-dev (port 8010)
## Deployment
- **Local path:** `/Users/ai_leed/Documents/Projects/Oliver/Barclays-banner-builder`
- **Run locally:** `docker compose up --build`
- **Server:** `ssh optical-dev "cd /opt/barclays-banner-builder && git pull && bash deploy.sh"`
- **URL:** `https://optical-dev.oliver.solutions/barclays-banner-builder/`
- **Apache config:** `/opt/barclays-banner-builder/deploy/apache-barclays.conf`
- **Port:** 8010 (backend API)
## Sessions
### 2026-04-20 Add visual progress indicator showing user

View file

@ -431,3 +431,9 @@ tags: [daily]
- 18:13 | `obsidian-vault`
- **Asked:** Check Obsidian integration and verify wiki index files contain complete information.
- **Done:** Added automation hook to PreCompact so wiki-count-sync.py runs before each compaction to keep article counters in _master-index.md synchronized.
- 18:18 | `aimpress`
- **Asked:** Configure ntfy on homelab and identify current alert destinations.
- **Done:** Investigated migration failure in 0056_user_roles.py and identified field rename issue in version 2026.2.2 requiring intermediate upgrade path.
- 18:21 | `obsidian-vault`
- **Asked:** Check Obsidian integration and verify wiki index files contain complete required information.
- **Done:** Updated master-index with new counters and fixed Barclays Banner Builder note structure.

View file

@ -201,6 +201,18 @@ Review the wiki for:
- Suggest 35 new articles that would strengthen the knowledge base
- **Do NOT make changes without confirmation — just report**
### Q&A Auto-Save Protocol
When answering a **complex technical question** during a session (debugging, architecture, how-to), save the answer to `wiki/qa/`:
1. Create `wiki/qa/{topic-slug}.md` with the question as H1, answer as body, `## Key Takeaways` section, and `## Related` wikilinks
2. Add entry to `wiki/qa/_index.md`
3. Update `wiki/_master-index.md` count
**Threshold for saving:** The answer involves non-obvious reasoning, debugging steps, or knowledge that would take >5 minutes to rediscover. Don't save answers to questions with obvious answers.
**Slug format:** `{problem}-{resolution}.md` e.g. `fastapi-cors-docker-fix.md`, `azure-ad-403-wrong-flow.md`
### Auto-compilation from Sessions
Sessions are captured automatically via Claude Code hooks and compiled to `wiki/concepts/`, `wiki/connections/`, and `wiki/qa/` after 21:00 (9 PM).

View file

@ -21,10 +21,10 @@ This 3-hop pattern works for hundreds of articles without vector search.
| [[wiki/obsidian-rag/_index\|obsidian-rag/]] | Karpathy's LLM wiki method — Obsidian RAG, setup, vs true RAG | 3 |
| [[wiki/projects-overview/_index\|projects-overview/]] | All 42 Oliver Agency projects — grouped by server (optical-web-1, optical-dev, baic, box-cli) | 1 |
| [[wiki/tech-patterns/_index\|tech-patterns/]] | Recurring tech stacks: FastAPI, React/Vite, Next.js, Azure AD, AI, Box, One2Edit, Redis/Celery, cost-tracker | 13 |
| [[wiki/architecture/_index\|architecture/]] | Cross-cutting architectural patterns: Docker Compose, multi-agent AI, GCP timeout, RAG, hotfolder, optical-dev deploy, cost-tracker | 7 |
| [[wiki/client-knowledge/_index\|client-knowledge/]] | Per-client notes for Ford, H&M, L'Oréal (2+ projects each) | 3 |
| [[wiki/architecture/_index\|architecture/]] | Cross-cutting architectural patterns: Docker Compose, multi-agent AI, GCP timeout, RAG, hotfolder, optical-dev deploy, cost-tracker, new-project checklist, troubleshooting playbooks, ADR log | 10 |
| [[wiki/client-knowledge/_index\|client-knowledge/]] | Per-client notes for Ford, H&M, L'Oréal, Barclays, Ferrero, 3M | 6 |
| [[wiki/concepts/_index\|concepts/]] | Atomic knowledge extracted from Claude Code sessions | 45 |
| [[wiki/connections/_index\|connections/]] | Cross-cutting insights linking 2+ concepts | 3 |
| [[wiki/connections/_index\|connections/]] | Cross-cutting insights linking 2+ concepts: FastAPI+Azure AD+Docker trinity, AI→cost-tracker, Apache+Vite basePath, GCP→REST polling, Box+hotfolder | 8 |
| [[wiki/qa/_index\|qa/]] | Filed answers to queries (saved with `--file-back`) | 0 |
| [[wiki/homelab/_index\|homelab/]] | Self-hosted infra: Proxmox install, IOMMU/PCI passthrough, hypervisor setup, budget builds, HP Elitedesk G3, Homarr API + Apps + Boards + Certificates + Integrations + Settings + Tasks + AdGuard + Clock + Docker Stats + Docker Integration + Download Client + Firewall + Proxmox Integration + Radarr + Readarr + Sonarr + Bookmarks + Calendar + Icons + App Widget + Weather + GitHub + Nextcloud + qBittorrent + RSS Feed + Speedtest Tracker + System Health Monitoring + System Resources + Services Map + Media Stack | 38 |
| [[wiki/web-agency/_index\|web-agency/]] | AI-assisted website building & selling: Claude Code, Nanobanana 2, Kling, LaunchPath MCP | 9 |

View file

@ -21,12 +21,21 @@ Cross-cutting architectural decisions that appear in multiple Oliver projects.
| [[wiki/architecture/hotfolder-daemon\|hotfolder-daemon]] | Box folder monitoring daemon with systemd | Ford QC, Ford SFTP |
| [[wiki/architecture/optical-dev-server-deploy\|optical-dev-server-deploy]] | optical-dev Apache subpath pattern: single vhost, Include conf, port table, deploy script | All Oliver projects |
| [[wiki/architecture/ai-cost-tracker\|ai-cost-tracker]] | Shared AI cost tracker: Docker Compose, Workspace→Team→Project, preflight/record HTTP API, LiteLLM pricing, hard budget limits | All Oliver projects |
| [[wiki/architecture/new-project-checklist\|new-project-checklist]] | Step-by-step Oliver project setup — repo, Docker Compose, Azure AD, cost tracker, optical-dev deploy | All new projects |
| [[wiki/architecture/troubleshooting-playbooks\|troubleshooting-playbooks]] | Failure → diagnosis → fix for FastAPI, Docker, React/Vite, Azure AD, Apache, PostgreSQL | All Oliver projects |
| [[wiki/architecture/adr-log\|adr-log]] | Architecture Decision Records — why HTTP polling, Docker Compose, FastAPI, Azure AD, cost tracker were chosen | All Oliver projects |
## Key Architectural Decisions
1. **Docker Compose** — default deployment for all multi-service projects on optical-dev
2. **HTTP polling over WebSocket** — mandatory on GCP (30s LB timeout)
2. **HTTP polling over WebSocket** — mandatory on GCP (30s LB timeout) — see [[wiki/architecture/adr-log|ADR-001]]
3. **AI pre-structuring before RAG indexing** — improves retrieval quality
4. **Hotfolder + archive pattern** — prevents reprocessing in Box automations
5. **DEV_AUTH_BYPASS / dev login** — skip Azure AD in local/dev environment, real auth in production
6. **Cost tracking as cross-cutting concern** — every AI call preflight+record via ai-cost-tracker
## Quick Links
- Starting a new project? → [[wiki/architecture/new-project-checklist|new-project-checklist]]
- Something broken? → [[wiki/architecture/troubleshooting-playbooks|troubleshooting-playbooks]]
- Why was X chosen? → [[wiki/architecture/adr-log|adr-log]]

View file

@ -0,0 +1,182 @@
---
title: "Architecture Decision Records (ADR)"
description: "Why specific tech choices were made at Oliver Agency — prevents relitigating decisions and documents constraints"
tags: [architecture, decisions, adr]
created: 2026-04-27
updated: 2026-04-27
---
# Architecture Decision Records
Decisions made and why. Prevents relitigating the same choices. Each record: decision, context, alternatives considered, rationale.
## Key Takeaways
- Most Oliver stack choices are driven by server constraints (GCP 30s LB timeout) and team familiarity
- Docker Compose is deliberately chosen over k8s for operational simplicity at this scale
- FastAPI over Django/Flask: async performance + auto-generated OpenAPI docs are worth the smaller ecosystem
- HTTP polling over WebSockets is a hard constraint, not a preference
---
## ADR-001: HTTP Polling over WebSockets
**Date:** 2026-03 (from Mod Comms incident)
**Status:** Active — applies to ALL Oliver projects
**Decision:** Never use WebSockets for long-running task communication. Use HTTP polling with a job table.
**Context:** Mod Comms was deployed on GCP behind a load balancer. WebSocket connections were dropped after exactly 30 seconds. The LB timeout is not configurable without GCP support escalation.
**Pattern:**
```
POST /api/jobs → {job_id}
GET /api/jobs/{id} → {status: pending|done, result?}
Frontend polls every 2s
```
**Applies to:** All projects on optical-dev (Apache) and GCP. optical-web-1 (direct systemd) is less affected but polling is still safer.
See [[wiki/architecture/gcp-deployment-lb-timeout|gcp-deployment-lb-timeout]].
---
## ADR-002: Docker Compose over Kubernetes
**Date:** ~2025
**Status:** Active
**Decision:** Single-server Docker Compose for all Oliver project deployments.
**Context:** Oliver Agency projects are internal tools and client portals, not public-scale services. Each project runs on one server with 13 services.
**Alternatives:** k8s (Minikube, GKE), Docker Swarm, bare systemd.
**Rationale:**
- k8s adds ~3 days of ops overhead per project for no benefit at this scale
- Docker Compose is understood by entire team
- Rollbacks: `docker compose up -d` with previous image tag
- optical-dev already runs 15+ Compose projects without issues
**Exceptions:** Hotfolder daemons on box-cli-01 use plain systemd (CentOS 7, no Docker).
---
## ADR-003: FastAPI over Django/Flask
**Date:** ~2024
**Status:** Active
**Decision:** FastAPI as the default Python backend framework.
**Rationale:**
- Async-first: handles concurrent AI API calls without blocking
- Auto-generated OpenAPI docs (`/docs`) — zero effort API documentation
- Pydantic models: input validation + serialization in one place
- Performance: competitive with Node.js for I/O-bound workloads
- Type hints throughout → fewer runtime errors
**When to deviate:**
- Admin CRUD with lots of forms → Django (but Oliver doesn't have these)
- Very simple one-endpoint proxy → Flask is fine
---
## ADR-004: React + Vite over Vue / Angular / SvelteKit
**Date:** ~2024
**Status:** Active
**Decision:** React 18 + Vite as the standard frontend stack.
**Rationale:**
- Team familiar with React; no training cost
- Vite: fast HMR, simple `base` config for subpath deploys
- React ecosystem: Shadcn/UI, Zustand, React Query all solid
- TypeScript + Vite: first-class support
**When to deviate:**
- No interactivity needed → plain HTML/JS (3M Portal, Ferrero AC Tool)
- Next.js needed → when SSR, image optimization, or complex routing required
---
## ADR-005: Azure AD / MSAL as Auth Standard
**Date:** ~2024
**Status:** Active
**Decision:** Azure AD SSO for all Oliver internal authenticated tools.
**Context:** Oliver Agency has a Microsoft 365 tenant. All employees have Azure AD accounts.
**Pattern:** MSAL.js PKCE in frontend (delegated flow) + JWKS token validation in FastAPI backend.
**Local dev bypass:** `DISABLE_AUTH=true` env var skips auth middleware. Never in production.
**Alternatives:** Auth0 (cost, external dependency), custom JWT (reinventing the wheel), Keycloak (infra overhead).
See [[wiki/tech-patterns/azure-ad-msal-auth|azure-ad-msal-auth]].
---
## ADR-006: Cost Tracker on Every AI Project
**Date:** 2026-04 (ai-cost-tracker launch)
**Status:** Active
**Decision:** Every Oliver project making AI API calls must integrate ai-cost-tracker with preflight + record.
**Context:** AI API costs (Gemini, Claude, OpenAI) can spike unpredictably. Without tracking, budget overruns only discovered on monthly bill.
**Integration cost:** ~30 minutes per project (3 env vars + 2 HTTP calls).
**Enforcement:** preflight() returns `allow: false` if budget exceeded — prevents runaway costs.
See [[wiki/tech-patterns/cost-tracker-integration|cost-tracker-integration]].
---
## ADR-007: Apache Single-Vhost Subpath Pattern on optical-dev
**Date:** 2026-04 (documented from Barclays Banner Builder)
**Status:** Active
**Decision:** All projects on optical-dev share one Apache vhost. Each project gets a subpath (`/project-name/`), not a subdomain.
**Context:** optical-dev has one public IP. Subdomain-per-project requires DNS management and SSL certificates. Subpath requires only Apache config fragments.
**Constraints:**
- React apps must use `VITE_BASE_PATH` and React Router `basename`
- All API calls must include the subpath prefix
- Include directive order matters — specific paths before catch-alls
See [[wiki/architecture/optical-dev-server-deploy|optical-dev-server-deploy]].
---
## ADR-008: Gemini over GPT for Barclays / GCP Projects
**Date:** 2026-03 (Mod Comms)
**Status:** Active for GCP-deployed projects
**Decision:** Prefer Google Gemini as AI provider for projects deployed on GCP.
**Rationale:** Google-to-Google latency advantage. GCP service account auth is simpler than API key rotation. Gemini Pro + Flash fallback gives cost/quality control.
**When to use Claude/OpenAI instead:** Client specifies it (PIMCO uses Claude API), or task requires better coding ability, or project is on optical-web-1 / optical-dev (neutral infrastructure).
---
## ADR-009: Node.js Proxy for One2Edit / Simple Portals
**Date:** ~2024
**Status:** Active
**Decision:** Use Node.js + vanilla JS (no framework, no build step) for simple CORS proxy portals.
**Context:** One2Edit API doesn't support CORS. H&M and 3M portals need to proxy requests to `oliver.one2edit.com`.
**Rationale:** No build pipeline = easier to deploy and debug. Vanilla JS works fine for 3-page portals. Node.js express proxy is 30 lines.
**Pattern:** Static files served by Node + `/api/*` proxied to external API. See [[wiki/tech-patterns/nodejs-vanilla-proxy|nodejs-vanilla-proxy]].
---
## Related
- [[wiki/architecture/gcp-deployment-lb-timeout|gcp-deployment-lb-timeout]]
- [[wiki/architecture/optical-dev-server-deploy|optical-dev-server-deploy]]
- [[wiki/architecture/new-project-checklist|new-project-checklist]]
- [[wiki/tech-patterns/_index|tech-patterns]] — all pattern articles

View file

@ -0,0 +1,221 @@
---
title: "New Oliver Project — Setup Checklist"
description: "Step-by-step checklist for starting any new Oliver Agency project — from Bitbucket to deployed on optical-dev"
tags: [architecture, checklist, deployment, setup]
created: 2026-04-27
updated: 2026-04-27
---
# New Oliver Project — Setup Checklist
## Key Takeaways
- Always use Docker Compose — it's the default for all multi-service projects
- No WebSockets on GCP or optical-dev — use HTTP job polling from day 1
- Add cost-tracker from the start if any AI API calls exist
- Azure AD MSAL is the Oliver standard for authenticated tools
- Create Obsidian project note before writing code
---
## Step 0 — Decide on Stack
Use the decision guide from [[wiki/tech-patterns/_index|tech-patterns]]:
```
New project → what stack?
├── Complex AI platform, multi-user → nextjs-fastapi-fullstack
├── Standard tool with UI → fastapi-python-docker + react-vite-typescript
├── Simple client portal / proxy → nodejs-vanilla-proxy
├── Static page, no backend → plain HTML/JS
└── Needs auth? → always azure-ad-msal-auth
└── Has AI calls? → always add cost-tracker-integration
```
---
## Step 1 — Create Obsidian Project Note
Before writing code. This is where context lives.
```bash
# Create folder + note in vault
mkdir -p ~/Library/Mobile\ Documents/iCloud~md~obsidian/Documents/VadymSamoilenko/01\ Projects/{project-name}/
```
Fill in frontmatter: `client`, `status`, `tech`, `local_path`, `server`, `deploy`, `url`.
---
## Step 2 — Repo and Local Folder
```bash
# Oliver uses Bitbucket (or GitHub for some projects)
# Local project root
mkdir /Users/ai_leed/Documents/Projects/Oliver/{project-name}
cd /Users/ai_leed/Documents/Projects/Oliver/{project-name}
git init
# Standard .gitignore
echo ".env\n.env.*\n!.env.example\n__pycache__/\nnode_modules/\ndist/\n.deploy_state/" > .gitignore
```
---
## Step 3 — Project Structure
### FastAPI + React (standard)
```
{project}/
├── backend/
│ ├── app/
│ │ ├── main.py
│ │ ├── routers/
│ │ ├── models.py
│ │ └── database.py
│ ├── requirements.txt
│ ├── Dockerfile
│ └── .env.example
├── frontend/
│ ├── src/
│ ├── package.json
│ ├── vite.config.ts ← must include base: VITE_BASE_PATH
│ └── .env.example
├── docker-compose.yml
├── docker-compose.prod.yml
├── deploy.sh
└── CLAUDE.md
```
### CLAUDE.md minimum
```markdown
# {Project Name}
- Server: optical-dev | optical-web-1
- Deploy: bash deploy.sh
- Stack: FastAPI (port 8XXX), React (Vite, /project-path/ subpath)
- Local: docker compose up --build
- No WebSockets — use HTTP polling for any async work
```
---
## Step 4 — Docker Compose
```yaml
# docker-compose.yml (local dev)
services:
api:
build: ./backend
ports: ["8010:8000"] # pick free port from optical-dev port table
env_file: ./backend/.env
depends_on: [db]
db:
image: postgres:16
environment:
POSTGRES_USER: ${POSTGRES_USER}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
POSTGRES_DB: ${POSTGRES_DB}
volumes: ["pgdata:/var/lib/postgresql/data"]
frontend:
build: ./frontend
ports: ["5173:5173"]
volumes:
pgdata:
```
For prod (`docker-compose.prod.yml`): bind API to `127.0.0.1:8010:8000` — never expose containers directly.
---
## Step 5 — Azure AD Auth (if needed)
Required for all Oliver internal tools. See [[wiki/tech-patterns/azure-ad-msal-auth|azure-ad-msal-auth]].
**Backend `.env.example`:**
```env
AZURE_TENANT_ID=
AZURE_CLIENT_ID=
AZURE_CLIENT_SECRET=
DISABLE_AUTH=true # local dev bypass
```
**Frontend `.env.example`:**
```env
VITE_AZURE_CLIENT_ID=
VITE_AZURE_TENANT_ID=
VITE_BACKEND_URL=http://localhost:8010
VITE_BASE_PATH=/project-name # for optical-dev subpath
```
---
## Step 6 — Cost Tracker (if AI calls)
Add from day 1. Zero-cost to integrate, prevents budget surprises. See [[wiki/tech-patterns/cost-tracker-integration|cost-tracker-integration]].
```env
COST_TRACKER_BASE_URL=https://optical-dev.oliver.solutions/cost-tracker/v1
COST_TRACKER_API_KEY=ct_live_xxx # get from admin UI
COST_TRACKER_SOURCE_APP={project-name}
```
---
## Step 7 — Async Work Pattern (no WebSockets)
GCP load balancer kills connections after 30s. Celery is overkill for simple cases — use the job table pattern:
```python
# POST /api/jobs → creates DB row, starts background task → returns {job_id}
# GET /api/jobs/{id} → returns {status, result}
# Frontend polls every 2s until status == "done"
```
Only add Redis + Celery if: multiple workers needed, task retries needed, or queue length monitoring needed. See [[wiki/tech-patterns/redis-celery-worker-queue|redis-celery-worker-queue]].
---
## Step 8 — Deploy to optical-dev
See [[wiki/architecture/optical-dev-server-deploy|optical-dev-server-deploy]] for full details.
```bash
# Pick a free port (80118039 range is safe)
# Create deploy/apache-{project}.conf fragment
# Run deploy.sh on server
ssh optical-dev "cd /opt/{project} && git pull && bash deploy.sh"
```
Minimum `deploy.sh` structure:
1. `git pull origin main`
2. Docker image rebuild (hash-based cache)
3. `docker compose -f docker-compose.prod.yml up -d`
4. Postgres readiness wait → `alembic upgrade head`
5. `npm run build` with `VITE_BASE_PATH`
6. `rsync dist/ → /var/www/html/{project}/`
7. Apache Include injection (idempotent)
8. `apache2ctl configtest && systemctl reload apache2`
---
## Step 9 — Project Note Final Update
After first deploy, update Obsidian note:
- `url:` → actual deployment URL
- `server:``optical-dev` or `optical-web-1`
- `deploy:` → exact command
- Add first session entry
---
## Related
- [[wiki/architecture/optical-dev-server-deploy|optical-dev-server-deploy]] — Apache patterns, port table, deploy scripts
- [[wiki/architecture/gcp-deployment-lb-timeout|gcp-deployment-lb-timeout]] — no WebSockets rule
- [[wiki/tech-patterns/azure-ad-msal-auth|azure-ad-msal-auth]] — auth setup
- [[wiki/tech-patterns/cost-tracker-integration|cost-tracker-integration]] — AI cost tracking
- [[wiki/tech-patterns/fastapi-python-docker|fastapi-python-docker]] — backend patterns
- [[wiki/tech-patterns/react-vite-typescript|react-vite-typescript]] — frontend patterns

View file

@ -0,0 +1,256 @@
---
title: "Troubleshooting Playbooks"
description: "Failure pattern → diagnosis → fix for FastAPI, Docker, React/Vite, Azure AD, Apache, PostgreSQL across Oliver projects"
tags: [architecture, troubleshooting, debugging, fastapi, docker, apache]
created: 2026-04-27
updated: 2026-04-27
---
# Troubleshooting Playbooks
Quick lookup: symptom → confirmed root cause → exact fix. Updated from real incidents.
## Key Takeaways
- Most 502s on optical-dev are Apache config errors, not dead containers
- Container "running" ≠ app healthy — always check `docker logs`
- Azure AD 403s are usually wrong auth flow type, not wrong permissions
- `alembic upgrade head` silently succeeds even if DB isn't ready — always wait for pg_isready first
- Next.js basePath must be in ALL redirect strings — middleware doesn't auto-prepend
---
## FastAPI
### 500 Internal Server Error on startup
```bash
docker logs {container} --tail 50
```
Common causes:
| Symptom | Cause | Fix |
|---------|-------|-----|
| `ModuleNotFoundError` | Missing package in requirements.txt | `pip install X && freeze` |
| `Could not connect to postgres` | DB container not ready | Add pg_isready loop before uvicorn |
| `pydantic ValidationError` on startup | Missing env var | Check `.env` vs `.env.example` diff |
| `Address already in use` | Port conflict | Check `docker ps` + port table |
### CORS errors from browser
```python
# app/main.py — must list exact origins, no trailing slash
app.add_middleware(CORSMiddleware,
allow_origins=["http://localhost:5173", "https://optical-dev.oliver.solutions"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
```
Never use `allow_origins=["*"]` with `allow_credentials=True` — browser blocks it.
### `max_tokens` rejected by OpenAI
Newer OpenAI models (`o1`, `o3`, `gpt-4o`) reject `max_tokens` — use `max_completion_tokens` instead. See [[wiki/concepts/openai-max-completion-tokens|openai-max-completion-tokens]].
---
## Docker / Docker Compose
### Container exits immediately (exit code 1)
```bash
docker compose logs api # see actual error
docker compose ps # check status
```
### `docker compose up` pulls old image despite code change
```bash
docker compose build --no-cache api # force rebuild
docker compose up -d
```
### Volume permissions error (PostgreSQL won't start)
```bash
docker compose down -v # destroy volumes (loses data in dev)
docker compose up -d
```
### Build fails with `COPY failed: file not found`
Check that Dockerfile paths are relative to build context. Typical mistake: `COPY requirements.txt .` when build context is repo root but requirements.txt is in `backend/`.
```yaml
# docker-compose.yml
services:
api:
build:
context: . # repo root
dockerfile: backend/Dockerfile
```
### Port already in use on server
```bash
sudo ss -tlnp | grep :8010 # find who holds the port
docker ps --format "{{.Names}} {{.Ports}}"
```
---
## React / Vite
### Blank page after deploy to optical-dev subpath
Missing `VITE_BASE_PATH` in build or React Router basename. See [[wiki/concepts/nextjs-basepath-auth-redirects|nextjs-basepath-auth-redirects]].
```bash
# Build must include base path
VITE_BASE_PATH=/my-project npm run build
# vite.config.ts
base: process.env.VITE_BASE_PATH ?? "/"
# main.tsx
<BrowserRouter basename={import.meta.env.VITE_BASE_PATH ?? "/"}>
```
### API calls return 404 in prod (work in dev)
All fetch calls must include the subpath prefix in prod:
```ts
const API = import.meta.env.VITE_BASE_PATH ?? "";
fetch(`${API}/api/jobs`) // NOT: fetch("/api/jobs")
```
### TypeScript build fails on CI but not locally
Usually an import that works with loose tsconfig locally. Run:
```bash
npx tsc --noEmit # strict check without building
```
Common: unused variables → prefix with `_`. Import of non-existent export → check barrel files.
---
## Azure AD / MSAL
### 403 `ErrorAccessDenied` from Graph API
Not a permissions problem — it's the wrong auth flow. See [[wiki/connections/graph-api-vs-msal-app-vs-delegated|graph-api-vs-msal-app-vs-delegated]].
- If **script/server task** (no user) → use app-only (client credentials) with Application permissions
- If **user-facing app** → use delegated (MSAL PKCE) with Delegated permissions
### `state mismatch` on MSAL redirect
See [[wiki/concepts/msal-vanilla-js-pkce|msal-vanilla-js-pkce]]. Causes:
- Multiple MSAL instances initialized (check for duplicate `new PublicClientApplication()`)
- Redirect URI in Azure Portal doesn't exactly match (trailing slash matters)
- Wrong platform type in Azure Portal — SPAs need **Single Page Application** platform, not Web
### Azure AD login works locally but fails on server
Redirect URI registered in Azure Portal must include the production URL. Checklist:
1. Azure Portal → App Registration → Authentication → Redirect URIs
2. Add `https://optical-dev.oliver.solutions/project/auth/callback`
3. `VITE_AZURE_REDIRECT_URI` in production `.env` must match exactly
### `DISABLE_AUTH=true` not working on server
Check that env var is in the correct service's `.env`, not root `.env`. In Docker Compose:
```yaml
env_file: ./backend/.env # not ./. env
```
---
## Apache on optical-dev
### 502 Bad Gateway
Almost always Apache config, not a dead container. See [[wiki/concepts/proxmox-container-502-misdiagnosis|proxmox-container-502-misdiagnosis]] for the general pattern.
Checklist:
```bash
# 1. Is the container actually running?
ssh optical-dev "docker ps | grep {project}"
# 2. Is the app listening on the expected port?
ssh optical-dev "curl -s http://localhost:8010/health"
# 3. Is Apache config valid?
ssh optical-dev "sudo apache2ctl configtest"
# 4. Check Apache error log
ssh optical-dev "sudo tail -50 /var/log/apache2/error.log"
```
Common config mistakes:
- `ProxyPass /project/api/` before `Alias /project` → ProxyPass wins for everything → 502 on static files
- Missing `proxy_http` module: `sudo a2enmod proxy proxy_http && systemctl reload apache2`
- Include fragment added to wrong location in vhost (after `</VirtualHost>`)
### 404 on SPA deep links (works on `/` but not `/project/dashboard`)
SPA routing needs `FallbackResource` or RewriteRule to serve index.html:
```apache
<Directory /var/www/html/my-project>
RewriteEngine On
RewriteBase /my-project/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^ index.html [L]
</Directory>
```
### Apache Include ordering conflict
If a previous project's Apache config has a catch-all `ProxyPass / http://...` — it will intercept all requests including your new project. Fix: reorder Include lines so your project's specific paths come before catch-alls.
```bash
# Check current Include order
ssh optical-dev "cat /etc/apache2/sites-available/optical-dev.oliver.solutions.conf | grep Include"
```
---
## PostgreSQL / Alembic
### Alembic `Can't locate revision` error
Migration files out of sync. Fix:
```bash
docker compose exec api alembic history # see all revisions
docker compose exec api alembic current # see DB state
docker compose exec api alembic stamp head # mark as current (use carefully)
```
### Migration silently does nothing
Postgres container isn't ready when alembic runs. Add readiness check:
```bash
until docker compose exec -T db pg_isready -U $POSTGRES_USER; do sleep 1; done
docker compose exec -T api alembic upgrade head
```
### Column already exists error on re-deploy
Migration is not idempotent. Use `IF NOT EXISTS` in raw SQL migrations or check `op.get_context().dialect.name` in Alembic ops.
---
## Related
- [[wiki/architecture/optical-dev-server-deploy|optical-dev-server-deploy]] — Apache patterns
- [[wiki/architecture/new-project-checklist|new-project-checklist]] — setup guide
- [[wiki/architecture/gcp-deployment-lb-timeout|gcp-deployment-lb-timeout]] — no WebSockets
- [[wiki/tech-patterns/azure-ad-msal-auth|azure-ad-msal-auth]] — Azure AD setup
- [[wiki/concepts/monorepo-deploy-script-pitfall|monorepo-deploy-script-pitfall]] — deploy.sh gotchas

View file

@ -0,0 +1,69 @@
---
title: "Client Knowledge: 3M"
description: "3M-specific context: OMG Portal, One2Edit API proxy, two-step auth, embedded editor"
tags: [client-knowledge, 3m]
created: 2026-04-27
updated: 2026-04-27
---
# Client Knowledge: 3M
## Key Takeaways
- One active project: 3M OMG Portal — Node.js CORS proxy for One2Edit translation management
- One2Edit has no CORS headers — all API calls must go through the Node.js proxy
- Two-step auth: service account for job listing, externSessionId for embedded editor
- No build step, no database — plain HTML/JS pages
---
## Projects
| Project | Server | Stack | Status | Purpose |
|---------|--------|-------|--------|---------|
| [[01 Projects/3m-portal/3M OMG Portal\|3M OMG Portal]] | baic (web-03) | Node.js + Vanilla JS + One2Edit API | active | Translation job management portal wrapping One2Edit |
---
## OMG Portal — Key Facts
**What it does:** 3-page portal wrapping `oliver.one2edit.com` API for managing 3M translation jobs
**Page flow:**
1. `login.html` — Two-step: username → userId, then externSessionId
2. `dashboard.html` — Job list (STARTED/RUNNING status), progress bars, PDF export
3. `editor.html` — Embedded One2Edit JS SDK using externSessionId
**Two auth modes:**
- **Service account** (`portal@oliver.agency`): used for job listing — gets stable userId
- **Session-based** (`externSessionId`): used for embedded editor — expires after session
**One2Edit API:** `https://oliver.one2edit.com/v3/Api.php` — CORS blocked, all calls proxy through `localhost:3000/api`.
**Proxy behavior in server.js:**
- Strips/rewrites Location headers on 301/302 → returns 401 (prevents auth redirect loops)
- Injects CORS headers on all `/api` responses
- Masks passwords in server logs
**Dev start:**
```bash
npm start # or: node server.js → http://localhost:3000
```
No build step — edit HTML/JS files directly.
---
## One2Edit Platform Notes
Same platform used by H&M. See [[wiki/client-knowledge/hm|H&M client knowledge]] and [[wiki/tech-patterns/one2edit-api|one2edit-api]] for full API details.
**Key quirk:** `sessionStorage` is used (not localStorage) — session is cleared on browser close. Users must log in again each browser session. This is intentional for security.
---
## Related
- [[wiki/tech-patterns/one2edit-api|one2edit-api]] — One2Edit API patterns (shared with H&M)
- [[wiki/tech-patterns/nodejs-vanilla-proxy|nodejs-vanilla-proxy]] — Node.js proxy pattern
- [[wiki/client-knowledge/hm|hm]] — H&M uses the same One2Edit platform

View file

@ -17,6 +17,9 @@ Per-client notes for clients with 2+ active projects. Covers tech preferences, c
| Ford | Ford QC, Ford SFTP | Box API, SFTP, systemd, Python | [[wiki/client-knowledge/ford\|ford]] |
| H&M | O2E Tool, EMS Report | One2Edit API, Python, JSON | [[wiki/client-knowledge/hm\|hm]] |
| L'Oréal | Global Kickoff, SLA Calculator | Box API, PHP, Make.com, Docker | [[wiki/client-knowledge/loreal\|loreal]] |
| Barclays | Mod Comms, Banner Builder | FastAPI, React, Gemini, GCP, Docker | [[wiki/client-knowledge/barclays\|barclays]] |
| Ferrero | AC Booking Tool | Node.js, Box API, CSV/OMG | [[wiki/client-knowledge/ferrero\|ferrero]] |
| 3M | OMG Portal | Node.js, Vanilla JS, One2Edit proxy | [[wiki/client-knowledge/3m\|3m]] |
## Single-Project Clients
These clients have only one project — context lives in the project note:

View file

@ -0,0 +1,99 @@
---
title: "Client Knowledge: Barclays"
description: "Barclays-specific context: projects, tech constraints, deployment quirks, and lessons learned"
tags: [client-knowledge, barclays]
created: 2026-04-27
updated: 2026-04-27
---
# Client Knowledge: Barclays
## Key Takeaways
- Two active projects: Mod Comms (GCP, multi-agent AI) and Banner Builder (optical-dev, React+FastAPI)
- Barclays requires strict brand compliance — logo versions matter, Barclays design tokens used in UI
- GCP deployment = no WebSockets — REST polling is mandatory for Mod Comms
- Banner Builder uses Zustand for workflow state management (journey store pattern)
---
## Projects
| Project | Server | Stack | Status | Purpose |
|---------|--------|-------|--------|---------|
| [[01 Projects/modcomms/Mod Comms\|Mod Comms]] | GCP | FastAPI + React + Gemini + PostgreSQL | active | AI proof review — compliance/brand/tone/channel checks |
| [[01 Projects/Barclays-banner-builder/Barclays Banner Builder\|Banner Builder]] | optical-dev | FastAPI + React + PostgreSQL + Docker | active | AI banner generation tool — Brief → Variants → Edit → Export |
---
## Mod Comms — Key Facts
**What it does:** Upload proof (image/PDF) → 4 AI agents analyze in parallel → lead agent synthesizes verdict
**4 agents:** Legal compliance, Brand adherence, Tone of Voice, Channel suitability
**AI:** Google Gemini Pro (primary) + Flash (fallback) — chosen for GCP co-location
**Critical incident (2026-03-18):** WebSocket connections dropped at 30s on GCP LB → switched to REST polling. See [[wiki/architecture/gcp-deployment-lb-timeout|gcp-deployment-lb-timeout]].
**Auth:** Azure AD (MSAL) — uses `DISABLE_AUTH=true` locally
**Dev start:**
```bash
# Backend
cd backend && uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
# Frontend
cd frontend && npm install && npm run dev
# DB migrations
cd backend && alembic upgrade head
```
**Env vars (backend):**
```
GEMINI_API_KEY=
DATABASE_URL=postgresql+asyncpg://user:pass@localhost:5432/modcomms
AZURE_TENANT_ID=
AZURE_CLIENT_ID=
DISABLE_AUTH=true
```
---
## Banner Builder — Key Facts
**What it does:** AI-assisted banner creation. Workflow: Brief → Edit Variants → Banner Editor → Export CSV/PDF
**Workflow state:** Managed with Zustand `journey store` — backward navigation allowed, forward steps grayed out until completed. See [[wiki/concepts/export-endpoint-filter-pattern|export-endpoint-filter-pattern]].
**Export quirk:** PDF/CSV exports must receive `variant_ids` from frontend — backend cannot infer selection. Always pass explicitly.
**Deploy:** optical-dev at `/barclays-banner-builder/` subpath. Deploy via `bash deploy.sh` on server.
**Apache config:** Barclays Include fragment at `/opt/barclays-banner-builder/deploy/apache-barclays.conf`. Port: 8010.
**Critical incident (2026-04-17):** Apache Include directive ordering — Banner Builder's conf was loading after hp-prod-tracker's catch-all `ProxyPass / http://...`, which intercepted all requests. Fixed by reordering Include lines in vhost config.
**Stack:**
- Frontend: React + TypeScript + Vite + Zustand
- Backend: FastAPI + Python + Alembic + PostgreSQL
- Auth: Azure AD (MSAL)
- Deploy: Docker Compose + Apache subpath
---
## Brand Requirements
- Logo versions matter — track which version is active (`v4`, `v5`, `v6`)
- Barclays design tokens used in UI (Zustand journey stepper used Barclays color tokens)
- Export outputs go to OMG media booking system — format must be exact
---
## Related
- [[wiki/architecture/gcp-deployment-lb-timeout|gcp-deployment-lb-timeout]] — WebSocket → REST polling
- [[wiki/architecture/optical-dev-server-deploy|optical-dev-server-deploy]] — Banner Builder deployment
- [[wiki/tech-patterns/python-ai-agents|python-ai-agents]] — multi-agent pattern used in Mod Comms
- [[wiki/concepts/export-endpoint-filter-pattern|export-endpoint-filter-pattern]] — variant_ids in exports

View file

@ -0,0 +1,53 @@
---
title: "Client Knowledge: Ferrero"
description: "Ferrero-specific context: AC Booking Tool, Box API, OMG CSV workflow"
tags: [client-knowledge, ferrero]
created: 2026-04-27
updated: 2026-04-27
---
# Client Knowledge: Ferrero
## Key Takeaways
- One active project: AC Booking Tool — browser-based, outputs CSV for OMG media booking system
- No Docker, no database — the simplest possible stack (Node.js + HTML/JS)
- Box API integration for "Send to OMG" folder saves
- CSV download works without the Node server — can open index.html directly
---
## Projects
| Project | Server | Stack | Status | Purpose |
|---------|--------|-------|--------|---------|
| [[01 Projects/ferrero-ac-creator/Ferrero AC Booking Tool\|AC Booking Tool]] | optical-web-1 | Node.js + HTML/JS + Box API | active | Generate CSV files for OMG media booking import |
---
## AC Booking Tool — Key Facts
**What it does:** Browser form for creating Ferrero communication asset bookings → export as CSV → import into OMG (media booking system)
**Two modes:**
1. **Standalone:** open `index.html` directly — CSV download works without server
2. **With server:** `node server.js``http://localhost:3456` — enables "Send to OMG" (Box API folder save)
**Box API:** Used for saving directly to a predefined Ferrero Box folder. Uses Box OAuth or service account credentials. See [[wiki/tech-patterns/box-api-integration|box-api-integration]].
**No build step:** Plain HTML/JS, no transpilation. Editing is direct.
**Deploy:** `node server.js` on optical-web-1. No Docker, no reverse proxy — runs as a direct port.
---
## OMG CSV Format
The CSV output format is dictated by OMG media booking system import requirements. Fields are Ferrero-specific (AC codes, booking dates, media channels). **Do not change column structure without confirming with client** — OMG imports are fragile to schema changes.
---
## Related
- [[wiki/tech-patterns/box-api-integration|box-api-integration]] — Box API patterns
- [[wiki/tech-patterns/nodejs-vanilla-proxy|nodejs-vanilla-proxy]] — Node.js + vanilla JS pattern

View file

@ -8,6 +8,11 @@
| [[wiki/connections/oauth-state-mismatch-debugging]] | LibreChat OpenID ↔ MSAL SPA ↔ Azure AD — state mismatch root cause shared across implementations | daily/2026-04-15.md | 2026-04-15 |
| [[wiki/connections/graph-api-vs-msal-app-vs-delegated]] | Graph API app-only ↔ MSAL delegated — choosing the right Azure AD auth flow; why delegated 403s on shared mailboxes | daily/2026-04-16.md | 2026-04-16 |
| [[wiki/connections/lxc-networking-api-failures]] | LXC ARP cache ↔ Node.js SSL — two orthogonal root causes that produce identical "API Error" symptoms | daily/2026-04-19.md | 2026-04-19 |
| [[wiki/connections/fastapi-azuread-docker-trinity]] | FastAPI ↔ Azure AD ↔ Docker — the three always go together; env vars, CORS, auth middleware wiring | 2026-04-27 | 2026-04-27 |
| [[wiki/connections/ai-always-needs-cost-tracker]] | AI API calls ↔ cost-tracker preflight/record — why retrofitting costs 2 days; per-provider token fields | 2026-04-27 | 2026-04-27 |
| [[wiki/connections/optical-dev-apache-vite-basepath]] | Apache subpath ↔ Vite basePath — two configs that must match; what breaks when they don't | 2026-04-27 | 2026-04-27 |
| [[wiki/connections/gcp-no-websockets]] | GCP LB 30s timeout ↔ REST polling ↔ job table pattern — infrastructure constraint forces code architecture | 2026-04-27 | 2026-04-27 |
| [[wiki/connections/box-api-hotfolder-pattern]] | Box API ↔ hotfolder daemon — always paired; archive pattern prevents double-processing | 2026-04-27 | 2026-04-27 |
<!-- Articles added automatically by compile.py -->
<!-- Format: | [[connections/slug]] | ConceptA ↔ ConceptB | daily/YYYY-MM-DD.md | date | -->

View file

@ -0,0 +1,83 @@
---
title: "Connection: Every AI Call Needs Cost Tracker"
description: "Why ai-cost-tracker must be added at project start, not retrofitted — and how preflight/record connects to every AI provider"
connects:
- "tech-patterns/python-ai-agents"
- "tech-patterns/cost-tracker-integration"
- "concepts/preflight-record-pattern"
created: 2026-04-27
updated: 2026-04-27
---
# Connection: Every AI Call Needs Cost Tracker
## The Connection
AI provider integrations (Gemini, Claude, OpenAI, ElevenLabs) and the Oliver cost tracker are not independent systems — the tracker's preflight/record pattern wraps every AI call. Projects built without cost tracking have no budget enforcement, and retroactively adding it means touching every AI call site.
## Key Insight
**Adding cost-tracker at project start costs 30 minutes. Retrofitting costs 2 days.** The preflight() call happens before the AI call, record() happens after. The tracker maintains per-project, per-team budgets and hard-blocks calls when exceeded. Without it, a runaway prompt loop or buggy retry logic can exhaust a monthly budget in minutes.
## The Pattern
```python
# Every AI call site looks like this
async def call_ai(prompt: str, user_id: str):
# 1. Estimate and gate
preflight_resp = await cost_tracker.post("/preflight", json={
"source_app": "my-project",
"model": "gemini-1.5-pro",
"estimated_input_tokens": len(prompt) // 4,
"estimated_output_tokens": 500,
"user_id": user_id
})
if not preflight_resp.json()["allow"]:
raise BudgetExceeded("Monthly budget reached")
# 2. Make the actual AI call
response = await gemini.generate(prompt)
# 3. Record actual usage
await cost_tracker.post("/usage/record", json={
"source_app": "my-project",
"model": "gemini-1.5-pro",
"input_tokens": response.usage.input_tokens,
"output_tokens": response.usage.output_tokens
})
return response
```
## Per-Provider Gotchas
| Provider | Token field | Notes |
|----------|------------|-------|
| Gemini | `response.usage_metadata.prompt_token_count` | Available after call |
| OpenAI | `response.usage.prompt_tokens` | `max_completion_tokens` not `max_tokens` for o1/o3 |
| Claude | `response.usage.input_tokens` | Via Anthropic SDK |
| ElevenLabs | `len(text)` characters | No token concept — billed by chars |
| GCP TTS | `len(text)` characters | Same as ElevenLabs |
See [[wiki/tech-patterns/cost-tracker-providers|cost-tracker-providers]] for full details.
## Why Retrofitting Is Hard
1. Every AI call site must be wrapped — in multi-agent systems, that's 410 places
2. The async httpx client needs to be shared (not a new client per call)
3. Budget workspace/team/project hierarchy must be set up in the admin UI first
4. Historical data is lost — you can't see what projects cost before integration
## Projects with Cost Tracker
ai-cost-tracker itself, Video Accessibility Platform, Enterprise Nexus (in progress).
## Projects that Need It Added
Any project making AI API calls without cost tracking: Mod Comms (Gemini), PIMCO Charts (Claude), Sandbox NotebookLM, Oliver AI Bot 2.0, Semblance.
## Related
- [[wiki/tech-patterns/cost-tracker-integration|cost-tracker-integration]] — 9-step integration guide
- [[wiki/tech-patterns/cost-tracker-providers|cost-tracker-providers]] — per-provider billing units
- [[wiki/concepts/preflight-record-pattern|preflight-record-pattern]] — the 3-step pattern
- [[wiki/tech-patterns/python-ai-agents|python-ai-agents]] — AI agent patterns

View file

@ -0,0 +1,89 @@
---
title: "Connection: Box API + Hotfolder Daemon — Always Together"
description: "Box API for asset access and the hotfolder daemon for folder monitoring are paired in all Ford and L'Oréal workflows"
connects:
- "tech-patterns/box-api-integration"
- "architecture/hotfolder-daemon"
created: 2026-04-27
updated: 2026-04-27
---
# Connection: Box API + Hotfolder Daemon — Always Together
## The Connection
Box API integration at Oliver always comes with a hotfolder daemon pattern. The Box API reads/writes assets; the hotfolder daemon watches specific folders for new files and triggers processing. They solve complementary problems: Box API is the transport, hotfolder is the trigger.
## Key Insight
**The hotfolder archive pattern prevents double-processing.** Without it, a file would be processed repeatedly until manually removed. The pattern: detect new file → process → move to `/processed/` archive. Box webhooks are an alternative but require a public endpoint and webhook management; polling is simpler and reliable.
## The Pattern
```
Box Folder (/incoming/)
↓ daemon polls every 30s
Hotfolder Script (box_monitor.py)
↓ new file detected
Process File (download → transform → upload result)
↓ success
Move to /incoming/_processed/{date}/
↓ or on failure
Move to /incoming/_errors/
```
## Implementation
```python
# Daemon loop (systemd service)
while True:
items = box_folder.get_items(fields=["id", "name", "created_at"])
for item in items:
if item.type == "file" and not is_processed(item.id):
process_file(item)
move_to_archive(item, processed_folder)
time.sleep(30)
```
The daemon runs as a systemd service on box-cli-01 (CentOS 7):
```ini
[Service]
ExecStart=/usr/bin/python3 /opt/ford-qc/box_monitor.py
Restart=always
RestartSec=10
```
## Box API Auth
Two modes used across Oliver projects:
| Mode | Used for | Credential |
|------|---------|-----------|
| Service account (JWT) | Server daemons (hotfolder, scheduled jobs) | `config.json` from Box Developer Console |
| OAuth 2.0 | User-facing tools (Ferrero AC Booking) | Client ID + Secret + redirect URI |
For hotfolder daemons: always use service account (JWT). No user interaction, runs 24/7.
## Where This Pattern Is Used
| Project | Client | What it does |
|---------|--------|-------------|
| Ford QC System | Ford | Watch Box → download proofs → AI quality check → write result |
| Ford SFTP Transfer | Ford | Watch Box → SFTP push to Ford servers |
| Ferrero AC Booking | Ferrero | "Send to OMG" button → upload CSV to Box folder |
| L'Oréal Global Kickoff | L'Oréal | Box asset management for kickoff materials |
## Gotchas
- **`_processed/` must be excluded from monitoring** — otherwise the daemon reprocesses archived files
- **Box rate limits:** 10 API calls/second per app. For large folders, add `time.sleep(0.1)` between items
- **box-cli-01 is CentOS 7 (EOL):** No Docker, Python 3.6 default — use `/usr/bin/python3` path explicitly
- **NFS mount at `/mnt/nfs`:** box-cli-01 has 1TB NFS for large asset storage; processed files go here, not Box
## Related
- [[wiki/tech-patterns/box-api-integration|box-api-integration]] — Box API patterns and auth
- [[wiki/architecture/hotfolder-daemon|hotfolder-daemon]] — systemd daemon pattern
- [[wiki/client-knowledge/ford|ford]] — Ford QC + SFTP projects
- [[wiki/client-knowledge/loreal|loreal]] — L'Oréal Box workflows
- [[wiki/infrastructure/server-box-cli|server-box-cli]] — box-cli-01 server details

View file

@ -0,0 +1,85 @@
---
title: "Connection: FastAPI + Azure AD + Docker — The Oliver Trinity"
description: "These three always appear together in Oliver internal tools — how they wire up and where each touches the other"
connects:
- "tech-patterns/fastapi-python-docker"
- "tech-patterns/azure-ad-msal-auth"
- "architecture/docker-compose-fullstack"
created: 2026-04-27
updated: 2026-04-27
---
# Connection: FastAPI + Azure AD + Docker — The Oliver Trinity
## The Connection
FastAPI, Azure AD MSAL, and Docker Compose appear together in almost every Oliver internal tool. They're designed independently but have specific integration points that only become clear across multiple projects.
## Key Insight
**The auth middleware must run inside the container, but credentials must come from outside it.** This creates a specific pattern: Azure AD env vars flow in through Docker Compose env_file, the MSAL JWKS validation runs in FastAPI middleware, and the Docker healthcheck must not hit authenticated endpoints (it has no token).
The three systems interact at exactly three points:
1. **JWT validation**: FastAPI reads `AZURE_TENANT_ID` + `AZURE_CLIENT_ID` from env → fetches JWKS from Azure → validates tokens from MSAL.js frontend
2. **CORS**: FastAPI CORS origins must include the exact frontend origin (no trailing slash) — when running in Docker, the origin is the host's port/domain, not the container's internal address
3. **Local dev bypass**: `DISABLE_AUTH=true` skips Azure AD entirely in dev — this env var must be in the Docker service's env_file, not just the host shell
## The Wiring
```
Browser (MSAL.js PKCE)
↓ acquireTokenSilent() → Azure AD → access_token (JWT)
↓ Authorization: Bearer {token}
Apache/nginx
↓ passes all headers intact (do NOT strip Authorization)
FastAPI middleware (HTTPBearer)
↓ fetches JWKS from https://login.microsoftonline.com/{tenant}/.well-known/openid-configuration
↓ validates token signature + audience + expiry
↓ injects user claims into request.state.user
Route handlers
↓ read request.state.user.preferred_username / roles
```
## Environment Variable Pattern
```yaml
# docker-compose.yml — the right way
services:
api:
env_file: ./backend/.env # contains AZURE_* + DISABLE_AUTH
frontend:
environment:
- VITE_AZURE_CLIENT_ID=${AZURE_CLIENT_ID}
- VITE_AZURE_TENANT_ID=${AZURE_TENANT_ID}
```
```env
# backend/.env
AZURE_TENANT_ID=xxx
AZURE_CLIENT_ID=xxx
AZURE_CLIENT_SECRET=xxx # only if app-only calls needed
DISABLE_AUTH=true # remove in production
```
## When DISABLE_AUTH Breaks
Teams sometimes forget to remove `DISABLE_AUTH=true` on the server. Symptoms: authenticated routes return 200 to anyone with no token. Add a startup check:
```python
import os
if os.getenv("DISABLE_AUTH", "false").lower() == "true":
import logging
logging.warning("⚠️ AUTH IS DISABLED — do not use in production")
```
## Projects Where This Trinity Appears
GMAL, Mod Comms, Video Accessibility, Semblance, Enterprise Nexus, Barclays Banner Builder, BAIC Dashboard.
## Related
- [[wiki/tech-patterns/fastapi-python-docker|fastapi-python-docker]]
- [[wiki/tech-patterns/azure-ad-msal-auth|azure-ad-msal-auth]]
- [[wiki/architecture/docker-compose-fullstack|docker-compose-fullstack]]
- [[wiki/architecture/new-project-checklist|new-project-checklist]]

View file

@ -0,0 +1,93 @@
---
title: "Connection: GCP LB Timeout → REST Polling → Job Table Pattern"
description: "The chain from GCP infrastructure constraint to application architecture — why the 30s LB timeout forces a specific code pattern"
connects:
- "architecture/gcp-deployment-lb-timeout"
- "tech-patterns/redis-celery-worker-queue"
- "tech-patterns/fastapi-python-docker"
created: 2026-04-27
updated: 2026-04-27
---
# Connection: GCP LB Timeout → REST Polling → Job Table Pattern
## The Connection
GCP load balancers terminate TCP connections after 30 seconds of inactivity. This single infrastructure constraint forces a specific application architecture pattern: job tables, HTTP polling, and eventually Celery for heavy workloads. The constraint propagates from infrastructure all the way to frontend code.
## Key Insight
**The 30s timeout isn't a WebSocket-specific problem — it affects any long-running synchronous request.** A `/api/analyze` endpoint that takes 45 seconds will also be killed. The fix isn't just replacing WebSockets; it's making the entire async workflow non-blocking within 30 seconds.
## The Constraint Chain
```
GCP Load Balancer: 30s TCP inactivity timeout
↓ forces
No WebSockets (connection held open → killed at 30s)
↓ forces
No long synchronous API calls > 30s
↓ forces
Job table pattern: POST creates job → GET polls status
↓ for heavy parallelism:
Redis + Celery for worker queue management
```
## Minimum Implementation (no Celery needed)
```python
# POST /api/jobs → returns immediately with job_id
@router.post("/jobs")
async def create_job(request: JobRequest, db: AsyncSession = Depends(get_db)):
job = Job(status="pending", ...)
db.add(job)
await db.commit()
asyncio.create_task(run_job(job.id)) # fire and forget
return {"job_id": job.id}
# GET /api/jobs/{id} → returns current status
@router.get("/jobs/{job_id}")
async def get_job(job_id: str, db: AsyncSession = Depends(get_db)):
job = await db.get(Job, job_id)
return {"status": job.status, "result": job.result}
```
```ts
// Frontend: poll every 2s until done
const poll = async (jobId: string) => {
while (true) {
const r = await fetch(`${API}/api/jobs/${jobId}`);
const { status, result } = await r.json();
if (status === "done") return result;
if (status === "error") throw new Error(result);
await new Promise(r => setTimeout(r, 2000));
}
};
```
## When to Add Celery
The simple `asyncio.create_task()` pattern breaks when:
- Multiple AI agents run in parallel and saturate event loop
- Tasks need retry logic on failure
- Need to see queue depth / worker utilization
- Jobs run for >5 minutes
For those cases: see [[wiki/tech-patterns/redis-celery-worker-queue|redis-celery-worker-queue]].
## Applies Also to optical-dev
Apache + corporate LB timeout is 3060s. Same pattern required on optical-dev, not just GCP.
## Projects Using This Pattern
- Mod Comms: switched from WebSocket after production incident (2026-03-18)
- Enterprise Nexus: HTTP polling from day 1
- Video Accessibility: Celery for heavy FFmpeg + AI pipeline
## Related
- [[wiki/architecture/gcp-deployment-lb-timeout|gcp-deployment-lb-timeout]] — full implementation
- [[wiki/tech-patterns/redis-celery-worker-queue|redis-celery-worker-queue]] — when Celery is needed
- [[wiki/architecture/adr-log|adr-log]] — ADR-001 documents this decision
- [[wiki/client-knowledge/barclays|barclays]] — Mod Comms incident

View file

@ -0,0 +1,102 @@
---
title: "Connection: optical-dev Apache Subpath + Vite basePath"
description: "The two-system pairing that makes React SPAs work at /project-name/ — Apache and Vite must be configured together or nothing works"
connects:
- "architecture/optical-dev-server-deploy"
- "tech-patterns/react-vite-typescript"
- "concepts/nextjs-basepath-auth-redirects"
created: 2026-04-27
updated: 2026-04-27
---
# Connection: optical-dev Apache Subpath + Vite basePath
## The Connection
React SPAs deployed to optical-dev live at a URL subpath (`/barclays-banner-builder/`) not the domain root. This requires Apache and Vite to both be configured with the same base path — getting one wrong without the other causes different but equally confusing failures.
## Key Insight
**Apache routes to the SPA. Vite tells the SPA where it lives.** These are two independent configurations that must match. A mismatch produces symptoms that look like routing bugs or blank pages, not configuration errors.
## The Two Configurations That Must Match
### Apache fragment (`/opt/project/deploy/apache-project.conf`)
```apache
# API proxy (MUST come before Alias)
ProxyPass /project-name/api/ http://127.0.0.1:8010/api/
ProxyPassReverse /project-name/api/ http://127.0.0.1:8010/api/
# SPA static files
Alias /project-name /var/www/html/project-name
<Directory /var/www/html/project-name>
RewriteEngine On
RewriteBase /project-name/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^ index.html [L]
</Directory>
```
### Vite + React (`vite.config.ts` + `main.tsx`)
```ts
// vite.config.ts
export default defineConfig({
base: process.env.VITE_BASE_PATH ?? "/",
})
// main.tsx
<BrowserRouter basename={import.meta.env.VITE_BASE_PATH ?? "/"}>
// api calls — must include base path
const API = import.meta.env.VITE_BASE_PATH ?? "";
fetch(`${API}/api/endpoint`)
```
### Build command
```bash
VITE_BASE_PATH=/project-name npm run build
```
## What Breaks When Misconfigured
| Misconfiguration | Symptom |
|-----------------|---------|
| Apache has subpath, Vite doesn't | App loads at `/project-name/` but JS assets 404 (wrong asset paths) |
| Vite has basePath, Apache doesn't | 404 on all subpath URLs |
| API calls don't include basePath | API calls work locally (no Apache), fail in prod |
| ProxyPass AFTER Alias | Apache serves `index.html` for all `/api/` calls → API returns HTML → JSON parse error |
| Azure AD redirect URI doesn't include basePath | Auth callback 404 after login |
## Auth + basePath
Azure AD redirect URI must include the subpath. See [[wiki/concepts/nextjs-basepath-auth-redirects|nextjs-basepath-auth-redirects]]:
```
✅ https://optical-dev.oliver.solutions/project-name/auth/callback
❌ https://optical-dev.oliver.solutions/auth/callback
```
In MSAL config:
```ts
auth: {
redirectUri: `${import.meta.env.VITE_BASE_PATH}/auth/callback`
}
```
## Checklist When Deploying a New SPA to optical-dev
- [ ] Apache fragment: ProxyPass before Alias, trailing slashes on ProxyPass
- [ ] Vite config: `base: VITE_BASE_PATH`
- [ ] React Router: `basename={VITE_BASE_PATH}`
- [ ] All `fetch()` calls: prefix with `VITE_BASE_PATH`
- [ ] Build command: `VITE_BASE_PATH=/project npm run build`
- [ ] Azure AD redirect URI: includes subpath in Portal + env var
- [ ] Apache configtest passes before reload
## Related
- [[wiki/architecture/optical-dev-server-deploy|optical-dev-server-deploy]] — full Apache config reference
- [[wiki/tech-patterns/react-vite-typescript|react-vite-typescript]] — Vite patterns
- [[wiki/concepts/nextjs-basepath-auth-redirects|nextjs-basepath-auth-redirects]] — basePath + auth
- [[wiki/architecture/troubleshooting-playbooks|troubleshooting-playbooks]] — when it breaks