From 401c22d9ffc8be5609462cfd94b90d54fb14e89d Mon Sep 17 00:00:00 2001 From: Vadym Samoilenko Date: Mon, 27 Apr 2026 18:22:09 +0100 Subject: [PATCH] vault backup: 2026-04-27 18:22:09 --- .../Barclays Banner Builder.md | 25 +- 99 Daily/2026-04-27.md | 6 + CLAUDE.md | 12 + wiki/_master-index.md | 6 +- wiki/architecture/_index.md | 11 +- wiki/architecture/adr-log.md | 182 +++++++++++++ wiki/architecture/new-project-checklist.md | 221 +++++++++++++++ .../architecture/troubleshooting-playbooks.md | 256 ++++++++++++++++++ wiki/client-knowledge/3m.md | 69 +++++ wiki/client-knowledge/_index.md | 3 + wiki/client-knowledge/barclays.md | 99 +++++++ wiki/client-knowledge/ferrero.md | 53 ++++ wiki/connections/_index.md | 5 + .../ai-always-needs-cost-tracker.md | 83 ++++++ wiki/connections/box-api-hotfolder-pattern.md | 89 ++++++ .../fastapi-azuread-docker-trinity.md | 85 ++++++ wiki/connections/gcp-no-websockets.md | 93 +++++++ .../optical-dev-apache-vite-basepath.md | 102 +++++++ 18 files changed, 1387 insertions(+), 13 deletions(-) create mode 100644 wiki/architecture/adr-log.md create mode 100644 wiki/architecture/new-project-checklist.md create mode 100644 wiki/architecture/troubleshooting-playbooks.md create mode 100644 wiki/client-knowledge/3m.md create mode 100644 wiki/client-knowledge/barclays.md create mode 100644 wiki/client-knowledge/ferrero.md create mode 100644 wiki/connections/ai-always-needs-cost-tracker.md create mode 100644 wiki/connections/box-api-hotfolder-pattern.md create mode 100644 wiki/connections/fastapi-azuread-docker-trinity.md create mode 100644 wiki/connections/gcp-no-websockets.md create mode 100644 wiki/connections/optical-dev-apache-vite-basepath.md diff --git a/01 Projects/Barclays-banner-builder/Barclays Banner Builder.md b/01 Projects/Barclays-banner-builder/Barclays Banner Builder.md index ea1bdb5..96d899a 100644 --- a/01 Projects/Barclays-banner-builder/Barclays Banner Builder.md +++ b/01 Projects/Barclays-banner-builder/Barclays Banner Builder.md @@ -1,27 +1,34 @@ --- name: "Barclays Banner Builder" -client: "TBD" +client: Barclays status: active server: optical-dev -tech: [] +tech: [React, TypeScript, Vite, Zustand, FastAPI, Python, PostgreSQL, Alembic, Docker] local_path: /Users/ai_leed/Documents/Projects/Oliver/Barclays-banner-builder -deploy: -url: +deploy: bash deploy.sh +url: https://optical-dev.oliver.solutions/barclays-banner-builder/ tags: - project + - client/barclays + - domain/ai created: 2026-04-17 --- ## Overview -> New project — fill in during first session. +AI-assisted banner generation tool for Barclays marketing assets. Workflow: Brief → Edit Variants → Banner Editor → Export (CSV/PDF). Uses Zustand journey store for workflow state — backward navigation allowed, forward steps grayed until completed. ## Tech Stack -- **Frontend:** -- **Backend:** -- **Infrastructure:** +- **Frontend:** React 18 + TypeScript + Vite + Zustand (journey store, Barclays design tokens) +- **Backend:** Python + FastAPI + PostgreSQL + Alembic migrations +- **Auth:** Azure AD (MSAL) +- **Infrastructure:** Docker Compose + Apache subpath on optical-dev (port 8010) ## Deployment -- **Local path:** `/Users/ai_leed/Documents/Projects/Oliver/Barclays-banner-builder` +- **Run locally:** `docker compose up --build` +- **Server:** `ssh optical-dev "cd /opt/barclays-banner-builder && git pull && bash deploy.sh"` +- **URL:** `https://optical-dev.oliver.solutions/barclays-banner-builder/` +- **Apache config:** `/opt/barclays-banner-builder/deploy/apache-barclays.conf` +- **Port:** 8010 (backend API) ## Sessions ### 2026-04-20 – Add visual progress indicator showing user diff --git a/99 Daily/2026-04-27.md b/99 Daily/2026-04-27.md index 6003434..ad51673 100644 --- a/99 Daily/2026-04-27.md +++ b/99 Daily/2026-04-27.md @@ -431,3 +431,9 @@ tags: [daily] - 18:13 | `obsidian-vault` - **Asked:** Check Obsidian integration and verify wiki index files contain complete information. - **Done:** Added automation hook to PreCompact so wiki-count-sync.py runs before each compaction to keep article counters in _master-index.md synchronized. +- 18:18 | `aimpress` + - **Asked:** Configure ntfy on homelab and identify current alert destinations. + - **Done:** Investigated migration failure in 0056_user_roles.py and identified field rename issue in version 2026.2.2 requiring intermediate upgrade path. +- 18:21 | `obsidian-vault` + - **Asked:** Check Obsidian integration and verify wiki index files contain complete required information. + - **Done:** Updated master-index with new counters and fixed Barclays Banner Builder note structure. diff --git a/CLAUDE.md b/CLAUDE.md index 5c26e1c..f3a396d 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -201,6 +201,18 @@ Review the wiki for: - Suggest 3–5 new articles that would strengthen the knowledge base - **Do NOT make changes without confirmation — just report** +### Q&A Auto-Save Protocol + +When answering a **complex technical question** during a session (debugging, architecture, how-to), save the answer to `wiki/qa/`: + +1. Create `wiki/qa/{topic-slug}.md` with the question as H1, answer as body, `## Key Takeaways` section, and `## Related` wikilinks +2. Add entry to `wiki/qa/_index.md` +3. Update `wiki/_master-index.md` count + +**Threshold for saving:** The answer involves non-obvious reasoning, debugging steps, or knowledge that would take >5 minutes to rediscover. Don't save answers to questions with obvious answers. + +**Slug format:** `{problem}-{resolution}.md` e.g. `fastapi-cors-docker-fix.md`, `azure-ad-403-wrong-flow.md` + ### Auto-compilation from Sessions Sessions are captured automatically via Claude Code hooks and compiled to `wiki/concepts/`, `wiki/connections/`, and `wiki/qa/` after 21:00 (9 PM). diff --git a/wiki/_master-index.md b/wiki/_master-index.md index 9a3a560..b054c7e 100644 --- a/wiki/_master-index.md +++ b/wiki/_master-index.md @@ -21,10 +21,10 @@ This 3-hop pattern works for hundreds of articles without vector search. | [[wiki/obsidian-rag/_index\|obsidian-rag/]] | Karpathy's LLM wiki method — Obsidian RAG, setup, vs true RAG | 3 | | [[wiki/projects-overview/_index\|projects-overview/]] | All 42 Oliver Agency projects — grouped by server (optical-web-1, optical-dev, baic, box-cli) | 1 | | [[wiki/tech-patterns/_index\|tech-patterns/]] | Recurring tech stacks: FastAPI, React/Vite, Next.js, Azure AD, AI, Box, One2Edit, Redis/Celery, cost-tracker | 13 | -| [[wiki/architecture/_index\|architecture/]] | Cross-cutting architectural patterns: Docker Compose, multi-agent AI, GCP timeout, RAG, hotfolder, optical-dev deploy, cost-tracker | 7 | -| [[wiki/client-knowledge/_index\|client-knowledge/]] | Per-client notes for Ford, H&M, L'Oréal (2+ projects each) | 3 | +| [[wiki/architecture/_index\|architecture/]] | Cross-cutting architectural patterns: Docker Compose, multi-agent AI, GCP timeout, RAG, hotfolder, optical-dev deploy, cost-tracker, new-project checklist, troubleshooting playbooks, ADR log | 10 | +| [[wiki/client-knowledge/_index\|client-knowledge/]] | Per-client notes for Ford, H&M, L'Oréal, Barclays, Ferrero, 3M | 6 | | [[wiki/concepts/_index\|concepts/]] | Atomic knowledge extracted from Claude Code sessions | 45 | -| [[wiki/connections/_index\|connections/]] | Cross-cutting insights linking 2+ concepts | 3 | +| [[wiki/connections/_index\|connections/]] | Cross-cutting insights linking 2+ concepts: FastAPI+Azure AD+Docker trinity, AI→cost-tracker, Apache+Vite basePath, GCP→REST polling, Box+hotfolder | 8 | | [[wiki/qa/_index\|qa/]] | Filed answers to queries (saved with `--file-back`) | 0 | | [[wiki/homelab/_index\|homelab/]] | Self-hosted infra: Proxmox install, IOMMU/PCI passthrough, hypervisor setup, budget builds, HP Elitedesk G3, Homarr API + Apps + Boards + Certificates + Integrations + Settings + Tasks + AdGuard + Clock + Docker Stats + Docker Integration + Download Client + Firewall + Proxmox Integration + Radarr + Readarr + Sonarr + Bookmarks + Calendar + Icons + App Widget + Weather + GitHub + Nextcloud + qBittorrent + RSS Feed + Speedtest Tracker + System Health Monitoring + System Resources + Services Map + Media Stack | 38 | | [[wiki/web-agency/_index\|web-agency/]] | AI-assisted website building & selling: Claude Code, Nanobanana 2, Kling, LaunchPath MCP | 9 | diff --git a/wiki/architecture/_index.md b/wiki/architecture/_index.md index 1e6f7b2..d4e2bc4 100644 --- a/wiki/architecture/_index.md +++ b/wiki/architecture/_index.md @@ -21,12 +21,21 @@ Cross-cutting architectural decisions that appear in multiple Oliver projects. | [[wiki/architecture/hotfolder-daemon\|hotfolder-daemon]] | Box folder monitoring daemon with systemd | Ford QC, Ford SFTP | | [[wiki/architecture/optical-dev-server-deploy\|optical-dev-server-deploy]] | optical-dev Apache subpath pattern: single vhost, Include conf, port table, deploy script | All Oliver projects | | [[wiki/architecture/ai-cost-tracker\|ai-cost-tracker]] | Shared AI cost tracker: Docker Compose, Workspace→Team→Project, preflight/record HTTP API, LiteLLM pricing, hard budget limits | All Oliver projects | +| [[wiki/architecture/new-project-checklist\|new-project-checklist]] | Step-by-step Oliver project setup — repo, Docker Compose, Azure AD, cost tracker, optical-dev deploy | All new projects | +| [[wiki/architecture/troubleshooting-playbooks\|troubleshooting-playbooks]] | Failure → diagnosis → fix for FastAPI, Docker, React/Vite, Azure AD, Apache, PostgreSQL | All Oliver projects | +| [[wiki/architecture/adr-log\|adr-log]] | Architecture Decision Records — why HTTP polling, Docker Compose, FastAPI, Azure AD, cost tracker were chosen | All Oliver projects | ## Key Architectural Decisions 1. **Docker Compose** — default deployment for all multi-service projects on optical-dev -2. **HTTP polling over WebSocket** — mandatory on GCP (30s LB timeout) +2. **HTTP polling over WebSocket** — mandatory on GCP (30s LB timeout) — see [[wiki/architecture/adr-log|ADR-001]] 3. **AI pre-structuring before RAG indexing** — improves retrieval quality 4. **Hotfolder + archive pattern** — prevents reprocessing in Box automations 5. **DEV_AUTH_BYPASS / dev login** — skip Azure AD in local/dev environment, real auth in production 6. **Cost tracking as cross-cutting concern** — every AI call preflight+record via ai-cost-tracker + +## Quick Links + +- Starting a new project? → [[wiki/architecture/new-project-checklist|new-project-checklist]] +- Something broken? → [[wiki/architecture/troubleshooting-playbooks|troubleshooting-playbooks]] +- Why was X chosen? → [[wiki/architecture/adr-log|adr-log]] diff --git a/wiki/architecture/adr-log.md b/wiki/architecture/adr-log.md new file mode 100644 index 0000000..b2a8562 --- /dev/null +++ b/wiki/architecture/adr-log.md @@ -0,0 +1,182 @@ +--- +title: "Architecture Decision Records (ADR)" +description: "Why specific tech choices were made at Oliver Agency — prevents relitigating decisions and documents constraints" +tags: [architecture, decisions, adr] +created: 2026-04-27 +updated: 2026-04-27 +--- + +# Architecture Decision Records + +Decisions made and why. Prevents relitigating the same choices. Each record: decision, context, alternatives considered, rationale. + +## Key Takeaways + +- Most Oliver stack choices are driven by server constraints (GCP 30s LB timeout) and team familiarity +- Docker Compose is deliberately chosen over k8s for operational simplicity at this scale +- FastAPI over Django/Flask: async performance + auto-generated OpenAPI docs are worth the smaller ecosystem +- HTTP polling over WebSockets is a hard constraint, not a preference + +--- + +## ADR-001: HTTP Polling over WebSockets +**Date:** 2026-03 (from Mod Comms incident) +**Status:** Active — applies to ALL Oliver projects + +**Decision:** Never use WebSockets for long-running task communication. Use HTTP polling with a job table. + +**Context:** Mod Comms was deployed on GCP behind a load balancer. WebSocket connections were dropped after exactly 30 seconds. The LB timeout is not configurable without GCP support escalation. + +**Pattern:** +``` +POST /api/jobs → {job_id} +GET /api/jobs/{id} → {status: pending|done, result?} +Frontend polls every 2s +``` + +**Applies to:** All projects on optical-dev (Apache) and GCP. optical-web-1 (direct systemd) is less affected but polling is still safer. + +See [[wiki/architecture/gcp-deployment-lb-timeout|gcp-deployment-lb-timeout]]. + +--- + +## ADR-002: Docker Compose over Kubernetes +**Date:** ~2025 +**Status:** Active + +**Decision:** Single-server Docker Compose for all Oliver project deployments. + +**Context:** Oliver Agency projects are internal tools and client portals, not public-scale services. Each project runs on one server with 1–3 services. + +**Alternatives:** k8s (Minikube, GKE), Docker Swarm, bare systemd. + +**Rationale:** +- k8s adds ~3 days of ops overhead per project for no benefit at this scale +- Docker Compose is understood by entire team +- Rollbacks: `docker compose up -d` with previous image tag +- optical-dev already runs 15+ Compose projects without issues + +**Exceptions:** Hotfolder daemons on box-cli-01 use plain systemd (CentOS 7, no Docker). + +--- + +## ADR-003: FastAPI over Django/Flask +**Date:** ~2024 +**Status:** Active + +**Decision:** FastAPI as the default Python backend framework. + +**Rationale:** +- Async-first: handles concurrent AI API calls without blocking +- Auto-generated OpenAPI docs (`/docs`) — zero effort API documentation +- Pydantic models: input validation + serialization in one place +- Performance: competitive with Node.js for I/O-bound workloads +- Type hints throughout → fewer runtime errors + +**When to deviate:** +- Admin CRUD with lots of forms → Django (but Oliver doesn't have these) +- Very simple one-endpoint proxy → Flask is fine + +--- + +## ADR-004: React + Vite over Vue / Angular / SvelteKit +**Date:** ~2024 +**Status:** Active + +**Decision:** React 18 + Vite as the standard frontend stack. + +**Rationale:** +- Team familiar with React; no training cost +- Vite: fast HMR, simple `base` config for subpath deploys +- React ecosystem: Shadcn/UI, Zustand, React Query all solid +- TypeScript + Vite: first-class support + +**When to deviate:** +- No interactivity needed → plain HTML/JS (3M Portal, Ferrero AC Tool) +- Next.js needed → when SSR, image optimization, or complex routing required + +--- + +## ADR-005: Azure AD / MSAL as Auth Standard +**Date:** ~2024 +**Status:** Active + +**Decision:** Azure AD SSO for all Oliver internal authenticated tools. + +**Context:** Oliver Agency has a Microsoft 365 tenant. All employees have Azure AD accounts. + +**Pattern:** MSAL.js PKCE in frontend (delegated flow) + JWKS token validation in FastAPI backend. + +**Local dev bypass:** `DISABLE_AUTH=true` env var skips auth middleware. Never in production. + +**Alternatives:** Auth0 (cost, external dependency), custom JWT (reinventing the wheel), Keycloak (infra overhead). + +See [[wiki/tech-patterns/azure-ad-msal-auth|azure-ad-msal-auth]]. + +--- + +## ADR-006: Cost Tracker on Every AI Project +**Date:** 2026-04 (ai-cost-tracker launch) +**Status:** Active + +**Decision:** Every Oliver project making AI API calls must integrate ai-cost-tracker with preflight + record. + +**Context:** AI API costs (Gemini, Claude, OpenAI) can spike unpredictably. Without tracking, budget overruns only discovered on monthly bill. + +**Integration cost:** ~30 minutes per project (3 env vars + 2 HTTP calls). + +**Enforcement:** preflight() returns `allow: false` if budget exceeded — prevents runaway costs. + +See [[wiki/tech-patterns/cost-tracker-integration|cost-tracker-integration]]. + +--- + +## ADR-007: Apache Single-Vhost Subpath Pattern on optical-dev +**Date:** 2026-04 (documented from Barclays Banner Builder) +**Status:** Active + +**Decision:** All projects on optical-dev share one Apache vhost. Each project gets a subpath (`/project-name/`), not a subdomain. + +**Context:** optical-dev has one public IP. Subdomain-per-project requires DNS management and SSL certificates. Subpath requires only Apache config fragments. + +**Constraints:** +- React apps must use `VITE_BASE_PATH` and React Router `basename` +- All API calls must include the subpath prefix +- Include directive order matters — specific paths before catch-alls + +See [[wiki/architecture/optical-dev-server-deploy|optical-dev-server-deploy]]. + +--- + +## ADR-008: Gemini over GPT for Barclays / GCP Projects +**Date:** 2026-03 (Mod Comms) +**Status:** Active for GCP-deployed projects + +**Decision:** Prefer Google Gemini as AI provider for projects deployed on GCP. + +**Rationale:** Google-to-Google latency advantage. GCP service account auth is simpler than API key rotation. Gemini Pro + Flash fallback gives cost/quality control. + +**When to use Claude/OpenAI instead:** Client specifies it (PIMCO uses Claude API), or task requires better coding ability, or project is on optical-web-1 / optical-dev (neutral infrastructure). + +--- + +## ADR-009: Node.js Proxy for One2Edit / Simple Portals +**Date:** ~2024 +**Status:** Active + +**Decision:** Use Node.js + vanilla JS (no framework, no build step) for simple CORS proxy portals. + +**Context:** One2Edit API doesn't support CORS. H&M and 3M portals need to proxy requests to `oliver.one2edit.com`. + +**Rationale:** No build pipeline = easier to deploy and debug. Vanilla JS works fine for 3-page portals. Node.js express proxy is 30 lines. + +**Pattern:** Static files served by Node + `/api/*` proxied to external API. See [[wiki/tech-patterns/nodejs-vanilla-proxy|nodejs-vanilla-proxy]]. + +--- + +## Related + +- [[wiki/architecture/gcp-deployment-lb-timeout|gcp-deployment-lb-timeout]] +- [[wiki/architecture/optical-dev-server-deploy|optical-dev-server-deploy]] +- [[wiki/architecture/new-project-checklist|new-project-checklist]] +- [[wiki/tech-patterns/_index|tech-patterns]] — all pattern articles diff --git a/wiki/architecture/new-project-checklist.md b/wiki/architecture/new-project-checklist.md new file mode 100644 index 0000000..865ed80 --- /dev/null +++ b/wiki/architecture/new-project-checklist.md @@ -0,0 +1,221 @@ +--- +title: "New Oliver Project — Setup Checklist" +description: "Step-by-step checklist for starting any new Oliver Agency project — from Bitbucket to deployed on optical-dev" +tags: [architecture, checklist, deployment, setup] +created: 2026-04-27 +updated: 2026-04-27 +--- + +# New Oliver Project — Setup Checklist + +## Key Takeaways + +- Always use Docker Compose — it's the default for all multi-service projects +- No WebSockets on GCP or optical-dev — use HTTP job polling from day 1 +- Add cost-tracker from the start if any AI API calls exist +- Azure AD MSAL is the Oliver standard for authenticated tools +- Create Obsidian project note before writing code + +--- + +## Step 0 — Decide on Stack + +Use the decision guide from [[wiki/tech-patterns/_index|tech-patterns]]: + +``` +New project → what stack? +├── Complex AI platform, multi-user → nextjs-fastapi-fullstack +├── Standard tool with UI → fastapi-python-docker + react-vite-typescript +├── Simple client portal / proxy → nodejs-vanilla-proxy +├── Static page, no backend → plain HTML/JS +└── Needs auth? → always azure-ad-msal-auth +└── Has AI calls? → always add cost-tracker-integration +``` + +--- + +## Step 1 — Create Obsidian Project Note + +Before writing code. This is where context lives. + +```bash +# Create folder + note in vault +mkdir -p ~/Library/Mobile\ Documents/iCloud~md~obsidian/Documents/VadymSamoilenko/01\ Projects/{project-name}/ +``` + +Fill in frontmatter: `client`, `status`, `tech`, `local_path`, `server`, `deploy`, `url`. + +--- + +## Step 2 — Repo and Local Folder + +```bash +# Oliver uses Bitbucket (or GitHub for some projects) +# Local project root +mkdir /Users/ai_leed/Documents/Projects/Oliver/{project-name} +cd /Users/ai_leed/Documents/Projects/Oliver/{project-name} +git init + +# Standard .gitignore +echo ".env\n.env.*\n!.env.example\n__pycache__/\nnode_modules/\ndist/\n.deploy_state/" > .gitignore +``` + +--- + +## Step 3 — Project Structure + +### FastAPI + React (standard) +``` +{project}/ +├── backend/ +│ ├── app/ +│ │ ├── main.py +│ │ ├── routers/ +│ │ ├── models.py +│ │ └── database.py +│ ├── requirements.txt +│ ├── Dockerfile +│ └── .env.example +├── frontend/ +│ ├── src/ +│ ├── package.json +│ ├── vite.config.ts ← must include base: VITE_BASE_PATH +│ └── .env.example +├── docker-compose.yml +├── docker-compose.prod.yml +├── deploy.sh +└── CLAUDE.md +``` + +### CLAUDE.md minimum +```markdown +# {Project Name} +- Server: optical-dev | optical-web-1 +- Deploy: bash deploy.sh +- Stack: FastAPI (port 8XXX), React (Vite, /project-path/ subpath) +- Local: docker compose up --build +- No WebSockets — use HTTP polling for any async work +``` + +--- + +## Step 4 — Docker Compose + +```yaml +# docker-compose.yml (local dev) +services: + api: + build: ./backend + ports: ["8010:8000"] # pick free port from optical-dev port table + env_file: ./backend/.env + depends_on: [db] + + db: + image: postgres:16 + environment: + POSTGRES_USER: ${POSTGRES_USER} + POSTGRES_PASSWORD: ${POSTGRES_PASSWORD} + POSTGRES_DB: ${POSTGRES_DB} + volumes: ["pgdata:/var/lib/postgresql/data"] + + frontend: + build: ./frontend + ports: ["5173:5173"] + +volumes: + pgdata: +``` + +For prod (`docker-compose.prod.yml`): bind API to `127.0.0.1:8010:8000` — never expose containers directly. + +--- + +## Step 5 — Azure AD Auth (if needed) + +Required for all Oliver internal tools. See [[wiki/tech-patterns/azure-ad-msal-auth|azure-ad-msal-auth]]. + +**Backend `.env.example`:** +```env +AZURE_TENANT_ID= +AZURE_CLIENT_ID= +AZURE_CLIENT_SECRET= +DISABLE_AUTH=true # local dev bypass +``` + +**Frontend `.env.example`:** +```env +VITE_AZURE_CLIENT_ID= +VITE_AZURE_TENANT_ID= +VITE_BACKEND_URL=http://localhost:8010 +VITE_BASE_PATH=/project-name # for optical-dev subpath +``` + +--- + +## Step 6 — Cost Tracker (if AI calls) + +Add from day 1. Zero-cost to integrate, prevents budget surprises. See [[wiki/tech-patterns/cost-tracker-integration|cost-tracker-integration]]. + +```env +COST_TRACKER_BASE_URL=https://optical-dev.oliver.solutions/cost-tracker/v1 +COST_TRACKER_API_KEY=ct_live_xxx # get from admin UI +COST_TRACKER_SOURCE_APP={project-name} +``` + +--- + +## Step 7 — Async Work Pattern (no WebSockets) + +GCP load balancer kills connections after 30s. Celery is overkill for simple cases — use the job table pattern: + +```python +# POST /api/jobs → creates DB row, starts background task → returns {job_id} +# GET /api/jobs/{id} → returns {status, result} +# Frontend polls every 2s until status == "done" +``` + +Only add Redis + Celery if: multiple workers needed, task retries needed, or queue length monitoring needed. See [[wiki/tech-patterns/redis-celery-worker-queue|redis-celery-worker-queue]]. + +--- + +## Step 8 — Deploy to optical-dev + +See [[wiki/architecture/optical-dev-server-deploy|optical-dev-server-deploy]] for full details. + +```bash +# Pick a free port (8011–8039 range is safe) +# Create deploy/apache-{project}.conf fragment +# Run deploy.sh on server +ssh optical-dev "cd /opt/{project} && git pull && bash deploy.sh" +``` + +Minimum `deploy.sh` structure: +1. `git pull origin main` +2. Docker image rebuild (hash-based cache) +3. `docker compose -f docker-compose.prod.yml up -d` +4. Postgres readiness wait → `alembic upgrade head` +5. `npm run build` with `VITE_BASE_PATH` +6. `rsync dist/ → /var/www/html/{project}/` +7. Apache Include injection (idempotent) +8. `apache2ctl configtest && systemctl reload apache2` + +--- + +## Step 9 — Project Note Final Update + +After first deploy, update Obsidian note: +- `url:` → actual deployment URL +- `server:` → `optical-dev` or `optical-web-1` +- `deploy:` → exact command +- Add first session entry + +--- + +## Related + +- [[wiki/architecture/optical-dev-server-deploy|optical-dev-server-deploy]] — Apache patterns, port table, deploy scripts +- [[wiki/architecture/gcp-deployment-lb-timeout|gcp-deployment-lb-timeout]] — no WebSockets rule +- [[wiki/tech-patterns/azure-ad-msal-auth|azure-ad-msal-auth]] — auth setup +- [[wiki/tech-patterns/cost-tracker-integration|cost-tracker-integration]] — AI cost tracking +- [[wiki/tech-patterns/fastapi-python-docker|fastapi-python-docker]] — backend patterns +- [[wiki/tech-patterns/react-vite-typescript|react-vite-typescript]] — frontend patterns diff --git a/wiki/architecture/troubleshooting-playbooks.md b/wiki/architecture/troubleshooting-playbooks.md new file mode 100644 index 0000000..4f537f9 --- /dev/null +++ b/wiki/architecture/troubleshooting-playbooks.md @@ -0,0 +1,256 @@ +--- +title: "Troubleshooting Playbooks" +description: "Failure pattern → diagnosis → fix for FastAPI, Docker, React/Vite, Azure AD, Apache, PostgreSQL across Oliver projects" +tags: [architecture, troubleshooting, debugging, fastapi, docker, apache] +created: 2026-04-27 +updated: 2026-04-27 +--- + +# Troubleshooting Playbooks + +Quick lookup: symptom → confirmed root cause → exact fix. Updated from real incidents. + +## Key Takeaways + +- Most 502s on optical-dev are Apache config errors, not dead containers +- Container "running" ≠ app healthy — always check `docker logs` +- Azure AD 403s are usually wrong auth flow type, not wrong permissions +- `alembic upgrade head` silently succeeds even if DB isn't ready — always wait for pg_isready first +- Next.js basePath must be in ALL redirect strings — middleware doesn't auto-prepend + +--- + +## FastAPI + +### 500 Internal Server Error on startup + +```bash +docker logs {container} --tail 50 +``` + +Common causes: +| Symptom | Cause | Fix | +|---------|-------|-----| +| `ModuleNotFoundError` | Missing package in requirements.txt | `pip install X && freeze` | +| `Could not connect to postgres` | DB container not ready | Add pg_isready loop before uvicorn | +| `pydantic ValidationError` on startup | Missing env var | Check `.env` vs `.env.example` diff | +| `Address already in use` | Port conflict | Check `docker ps` + port table | + +### CORS errors from browser + +```python +# app/main.py — must list exact origins, no trailing slash +app.add_middleware(CORSMiddleware, + allow_origins=["http://localhost:5173", "https://optical-dev.oliver.solutions"], + allow_credentials=True, + allow_methods=["*"], + allow_headers=["*"], +) +``` + +Never use `allow_origins=["*"]` with `allow_credentials=True` — browser blocks it. + +### `max_tokens` rejected by OpenAI + +Newer OpenAI models (`o1`, `o3`, `gpt-4o`) reject `max_tokens` — use `max_completion_tokens` instead. See [[wiki/concepts/openai-max-completion-tokens|openai-max-completion-tokens]]. + +--- + +## Docker / Docker Compose + +### Container exits immediately (exit code 1) + +```bash +docker compose logs api # see actual error +docker compose ps # check status +``` + +### `docker compose up` pulls old image despite code change + +```bash +docker compose build --no-cache api # force rebuild +docker compose up -d +``` + +### Volume permissions error (PostgreSQL won't start) + +```bash +docker compose down -v # destroy volumes (loses data in dev) +docker compose up -d +``` + +### Build fails with `COPY failed: file not found` + +Check that Dockerfile paths are relative to build context. Typical mistake: `COPY requirements.txt .` when build context is repo root but requirements.txt is in `backend/`. + +```yaml +# docker-compose.yml +services: + api: + build: + context: . # repo root + dockerfile: backend/Dockerfile +``` + +### Port already in use on server + +```bash +sudo ss -tlnp | grep :8010 # find who holds the port +docker ps --format "{{.Names}} {{.Ports}}" +``` + +--- + +## React / Vite + +### Blank page after deploy to optical-dev subpath + +Missing `VITE_BASE_PATH` in build or React Router basename. See [[wiki/concepts/nextjs-basepath-auth-redirects|nextjs-basepath-auth-redirects]]. + +```bash +# Build must include base path +VITE_BASE_PATH=/my-project npm run build + +# vite.config.ts +base: process.env.VITE_BASE_PATH ?? "/" + +# main.tsx + +``` + +### API calls return 404 in prod (work in dev) + +All fetch calls must include the subpath prefix in prod: + +```ts +const API = import.meta.env.VITE_BASE_PATH ?? ""; +fetch(`${API}/api/jobs`) // NOT: fetch("/api/jobs") +``` + +### TypeScript build fails on CI but not locally + +Usually an import that works with loose tsconfig locally. Run: + +```bash +npx tsc --noEmit # strict check without building +``` + +Common: unused variables → prefix with `_`. Import of non-existent export → check barrel files. + +--- + +## Azure AD / MSAL + +### 403 `ErrorAccessDenied` from Graph API + +Not a permissions problem — it's the wrong auth flow. See [[wiki/connections/graph-api-vs-msal-app-vs-delegated|graph-api-vs-msal-app-vs-delegated]]. + +- If **script/server task** (no user) → use app-only (client credentials) with Application permissions +- If **user-facing app** → use delegated (MSAL PKCE) with Delegated permissions + +### `state mismatch` on MSAL redirect + +See [[wiki/concepts/msal-vanilla-js-pkce|msal-vanilla-js-pkce]]. Causes: +- Multiple MSAL instances initialized (check for duplicate `new PublicClientApplication()`) +- Redirect URI in Azure Portal doesn't exactly match (trailing slash matters) +- Wrong platform type in Azure Portal — SPAs need **Single Page Application** platform, not Web + +### Azure AD login works locally but fails on server + +Redirect URI registered in Azure Portal must include the production URL. Checklist: +1. Azure Portal → App Registration → Authentication → Redirect URIs +2. Add `https://optical-dev.oliver.solutions/project/auth/callback` +3. `VITE_AZURE_REDIRECT_URI` in production `.env` must match exactly + +### `DISABLE_AUTH=true` not working on server + +Check that env var is in the correct service's `.env`, not root `.env`. In Docker Compose: +```yaml +env_file: ./backend/.env # not ./. env +``` + +--- + +## Apache on optical-dev + +### 502 Bad Gateway + +Almost always Apache config, not a dead container. See [[wiki/concepts/proxmox-container-502-misdiagnosis|proxmox-container-502-misdiagnosis]] for the general pattern. + +Checklist: +```bash +# 1. Is the container actually running? +ssh optical-dev "docker ps | grep {project}" + +# 2. Is the app listening on the expected port? +ssh optical-dev "curl -s http://localhost:8010/health" + +# 3. Is Apache config valid? +ssh optical-dev "sudo apache2ctl configtest" + +# 4. Check Apache error log +ssh optical-dev "sudo tail -50 /var/log/apache2/error.log" +``` + +Common config mistakes: +- `ProxyPass /project/api/` before `Alias /project` → ProxyPass wins for everything → 502 on static files +- Missing `proxy_http` module: `sudo a2enmod proxy proxy_http && systemctl reload apache2` +- Include fragment added to wrong location in vhost (after ``) + +### 404 on SPA deep links (works on `/` but not `/project/dashboard`) + +SPA routing needs `FallbackResource` or RewriteRule to serve index.html: + +```apache + + RewriteEngine On + RewriteBase /my-project/ + RewriteCond %{REQUEST_FILENAME} !-f + RewriteCond %{REQUEST_FILENAME} !-d + RewriteRule ^ index.html [L] + +``` + +### Apache Include ordering conflict + +If a previous project's Apache config has a catch-all `ProxyPass / http://...` — it will intercept all requests including your new project. Fix: reorder Include lines so your project's specific paths come before catch-alls. + +```bash +# Check current Include order +ssh optical-dev "cat /etc/apache2/sites-available/optical-dev.oliver.solutions.conf | grep Include" +``` + +--- + +## PostgreSQL / Alembic + +### Alembic `Can't locate revision` error + +Migration files out of sync. Fix: +```bash +docker compose exec api alembic history # see all revisions +docker compose exec api alembic current # see DB state +docker compose exec api alembic stamp head # mark as current (use carefully) +``` + +### Migration silently does nothing + +Postgres container isn't ready when alembic runs. Add readiness check: +```bash +until docker compose exec -T db pg_isready -U $POSTGRES_USER; do sleep 1; done +docker compose exec -T api alembic upgrade head +``` + +### Column already exists error on re-deploy + +Migration is not idempotent. Use `IF NOT EXISTS` in raw SQL migrations or check `op.get_context().dialect.name` in Alembic ops. + +--- + +## Related + +- [[wiki/architecture/optical-dev-server-deploy|optical-dev-server-deploy]] — Apache patterns +- [[wiki/architecture/new-project-checklist|new-project-checklist]] — setup guide +- [[wiki/architecture/gcp-deployment-lb-timeout|gcp-deployment-lb-timeout]] — no WebSockets +- [[wiki/tech-patterns/azure-ad-msal-auth|azure-ad-msal-auth]] — Azure AD setup +- [[wiki/concepts/monorepo-deploy-script-pitfall|monorepo-deploy-script-pitfall]] — deploy.sh gotchas diff --git a/wiki/client-knowledge/3m.md b/wiki/client-knowledge/3m.md new file mode 100644 index 0000000..189ce85 --- /dev/null +++ b/wiki/client-knowledge/3m.md @@ -0,0 +1,69 @@ +--- +title: "Client Knowledge: 3M" +description: "3M-specific context: OMG Portal, One2Edit API proxy, two-step auth, embedded editor" +tags: [client-knowledge, 3m] +created: 2026-04-27 +updated: 2026-04-27 +--- + +# Client Knowledge: 3M + +## Key Takeaways + +- One active project: 3M OMG Portal — Node.js CORS proxy for One2Edit translation management +- One2Edit has no CORS headers — all API calls must go through the Node.js proxy +- Two-step auth: service account for job listing, externSessionId for embedded editor +- No build step, no database — plain HTML/JS pages + +--- + +## Projects + +| Project | Server | Stack | Status | Purpose | +|---------|--------|-------|--------|---------| +| [[01 Projects/3m-portal/3M OMG Portal\|3M OMG Portal]] | baic (web-03) | Node.js + Vanilla JS + One2Edit API | active | Translation job management portal wrapping One2Edit | + +--- + +## OMG Portal — Key Facts + +**What it does:** 3-page portal wrapping `oliver.one2edit.com` API for managing 3M translation jobs + +**Page flow:** +1. `login.html` — Two-step: username → userId, then externSessionId +2. `dashboard.html` — Job list (STARTED/RUNNING status), progress bars, PDF export +3. `editor.html` — Embedded One2Edit JS SDK using externSessionId + +**Two auth modes:** +- **Service account** (`portal@oliver.agency`): used for job listing — gets stable userId +- **Session-based** (`externSessionId`): used for embedded editor — expires after session + +**One2Edit API:** `https://oliver.one2edit.com/v3/Api.php` — CORS blocked, all calls proxy through `localhost:3000/api`. + +**Proxy behavior in server.js:** +- Strips/rewrites Location headers on 301/302 → returns 401 (prevents auth redirect loops) +- Injects CORS headers on all `/api` responses +- Masks passwords in server logs + +**Dev start:** +```bash +npm start # or: node server.js → http://localhost:3000 +``` + +No build step — edit HTML/JS files directly. + +--- + +## One2Edit Platform Notes + +Same platform used by H&M. See [[wiki/client-knowledge/hm|H&M client knowledge]] and [[wiki/tech-patterns/one2edit-api|one2edit-api]] for full API details. + +**Key quirk:** `sessionStorage` is used (not localStorage) — session is cleared on browser close. Users must log in again each browser session. This is intentional for security. + +--- + +## Related + +- [[wiki/tech-patterns/one2edit-api|one2edit-api]] — One2Edit API patterns (shared with H&M) +- [[wiki/tech-patterns/nodejs-vanilla-proxy|nodejs-vanilla-proxy]] — Node.js proxy pattern +- [[wiki/client-knowledge/hm|hm]] — H&M uses the same One2Edit platform diff --git a/wiki/client-knowledge/_index.md b/wiki/client-knowledge/_index.md index 871101f..e521289 100644 --- a/wiki/client-knowledge/_index.md +++ b/wiki/client-knowledge/_index.md @@ -17,6 +17,9 @@ Per-client notes for clients with 2+ active projects. Covers tech preferences, c | Ford | Ford QC, Ford SFTP | Box API, SFTP, systemd, Python | [[wiki/client-knowledge/ford\|ford]] | | H&M | O2E Tool, EMS Report | One2Edit API, Python, JSON | [[wiki/client-knowledge/hm\|hm]] | | L'Oréal | Global Kickoff, SLA Calculator | Box API, PHP, Make.com, Docker | [[wiki/client-knowledge/loreal\|loreal]] | +| Barclays | Mod Comms, Banner Builder | FastAPI, React, Gemini, GCP, Docker | [[wiki/client-knowledge/barclays\|barclays]] | +| Ferrero | AC Booking Tool | Node.js, Box API, CSV/OMG | [[wiki/client-knowledge/ferrero\|ferrero]] | +| 3M | OMG Portal | Node.js, Vanilla JS, One2Edit proxy | [[wiki/client-knowledge/3m\|3m]] | ## Single-Project Clients These clients have only one project — context lives in the project note: diff --git a/wiki/client-knowledge/barclays.md b/wiki/client-knowledge/barclays.md new file mode 100644 index 0000000..255bc8b --- /dev/null +++ b/wiki/client-knowledge/barclays.md @@ -0,0 +1,99 @@ +--- +title: "Client Knowledge: Barclays" +description: "Barclays-specific context: projects, tech constraints, deployment quirks, and lessons learned" +tags: [client-knowledge, barclays] +created: 2026-04-27 +updated: 2026-04-27 +--- + +# Client Knowledge: Barclays + +## Key Takeaways + +- Two active projects: Mod Comms (GCP, multi-agent AI) and Banner Builder (optical-dev, React+FastAPI) +- Barclays requires strict brand compliance — logo versions matter, Barclays design tokens used in UI +- GCP deployment = no WebSockets — REST polling is mandatory for Mod Comms +- Banner Builder uses Zustand for workflow state management (journey store pattern) + +--- + +## Projects + +| Project | Server | Stack | Status | Purpose | +|---------|--------|-------|--------|---------| +| [[01 Projects/modcomms/Mod Comms\|Mod Comms]] | GCP | FastAPI + React + Gemini + PostgreSQL | active | AI proof review — compliance/brand/tone/channel checks | +| [[01 Projects/Barclays-banner-builder/Barclays Banner Builder\|Banner Builder]] | optical-dev | FastAPI + React + PostgreSQL + Docker | active | AI banner generation tool — Brief → Variants → Edit → Export | + +--- + +## Mod Comms — Key Facts + +**What it does:** Upload proof (image/PDF) → 4 AI agents analyze in parallel → lead agent synthesizes verdict + +**4 agents:** Legal compliance, Brand adherence, Tone of Voice, Channel suitability + +**AI:** Google Gemini Pro (primary) + Flash (fallback) — chosen for GCP co-location + +**Critical incident (2026-03-18):** WebSocket connections dropped at 30s on GCP LB → switched to REST polling. See [[wiki/architecture/gcp-deployment-lb-timeout|gcp-deployment-lb-timeout]]. + +**Auth:** Azure AD (MSAL) — uses `DISABLE_AUTH=true` locally + +**Dev start:** +```bash +# Backend +cd backend && uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 + +# Frontend +cd frontend && npm install && npm run dev + +# DB migrations +cd backend && alembic upgrade head +``` + +**Env vars (backend):** +``` +GEMINI_API_KEY= +DATABASE_URL=postgresql+asyncpg://user:pass@localhost:5432/modcomms +AZURE_TENANT_ID= +AZURE_CLIENT_ID= +DISABLE_AUTH=true +``` + +--- + +## Banner Builder — Key Facts + +**What it does:** AI-assisted banner creation. Workflow: Brief → Edit Variants → Banner Editor → Export CSV/PDF + +**Workflow state:** Managed with Zustand `journey store` — backward navigation allowed, forward steps grayed out until completed. See [[wiki/concepts/export-endpoint-filter-pattern|export-endpoint-filter-pattern]]. + +**Export quirk:** PDF/CSV exports must receive `variant_ids` from frontend — backend cannot infer selection. Always pass explicitly. + +**Deploy:** optical-dev at `/barclays-banner-builder/` subpath. Deploy via `bash deploy.sh` on server. + +**Apache config:** Barclays Include fragment at `/opt/barclays-banner-builder/deploy/apache-barclays.conf`. Port: 8010. + +**Critical incident (2026-04-17):** Apache Include directive ordering — Banner Builder's conf was loading after hp-prod-tracker's catch-all `ProxyPass / http://...`, which intercepted all requests. Fixed by reordering Include lines in vhost config. + +**Stack:** +- Frontend: React + TypeScript + Vite + Zustand +- Backend: FastAPI + Python + Alembic + PostgreSQL +- Auth: Azure AD (MSAL) +- Deploy: Docker Compose + Apache subpath + +--- + +## Brand Requirements + +- Logo versions matter — track which version is active (`v4`, `v5`, `v6`) +- Barclays design tokens used in UI (Zustand journey stepper used Barclays color tokens) +- Export outputs go to OMG media booking system — format must be exact + +--- + +## Related + +- [[wiki/architecture/gcp-deployment-lb-timeout|gcp-deployment-lb-timeout]] — WebSocket → REST polling +- [[wiki/architecture/optical-dev-server-deploy|optical-dev-server-deploy]] — Banner Builder deployment +- [[wiki/tech-patterns/python-ai-agents|python-ai-agents]] — multi-agent pattern used in Mod Comms +- [[wiki/concepts/export-endpoint-filter-pattern|export-endpoint-filter-pattern]] — variant_ids in exports diff --git a/wiki/client-knowledge/ferrero.md b/wiki/client-knowledge/ferrero.md new file mode 100644 index 0000000..c4923ff --- /dev/null +++ b/wiki/client-knowledge/ferrero.md @@ -0,0 +1,53 @@ +--- +title: "Client Knowledge: Ferrero" +description: "Ferrero-specific context: AC Booking Tool, Box API, OMG CSV workflow" +tags: [client-knowledge, ferrero] +created: 2026-04-27 +updated: 2026-04-27 +--- + +# Client Knowledge: Ferrero + +## Key Takeaways + +- One active project: AC Booking Tool — browser-based, outputs CSV for OMG media booking system +- No Docker, no database — the simplest possible stack (Node.js + HTML/JS) +- Box API integration for "Send to OMG" folder saves +- CSV download works without the Node server — can open index.html directly + +--- + +## Projects + +| Project | Server | Stack | Status | Purpose | +|---------|--------|-------|--------|---------| +| [[01 Projects/ferrero-ac-creator/Ferrero AC Booking Tool\|AC Booking Tool]] | optical-web-1 | Node.js + HTML/JS + Box API | active | Generate CSV files for OMG media booking import | + +--- + +## AC Booking Tool — Key Facts + +**What it does:** Browser form for creating Ferrero communication asset bookings → export as CSV → import into OMG (media booking system) + +**Two modes:** +1. **Standalone:** open `index.html` directly — CSV download works without server +2. **With server:** `node server.js` → `http://localhost:3456` — enables "Send to OMG" (Box API folder save) + +**Box API:** Used for saving directly to a predefined Ferrero Box folder. Uses Box OAuth or service account credentials. See [[wiki/tech-patterns/box-api-integration|box-api-integration]]. + +**No build step:** Plain HTML/JS, no transpilation. Editing is direct. + +**Deploy:** `node server.js` on optical-web-1. No Docker, no reverse proxy — runs as a direct port. + +--- + +## OMG CSV Format + +The CSV output format is dictated by OMG media booking system import requirements. Fields are Ferrero-specific (AC codes, booking dates, media channels). **Do not change column structure without confirming with client** — OMG imports are fragile to schema changes. + +--- + +## Related + +- [[wiki/tech-patterns/box-api-integration|box-api-integration]] — Box API patterns +- [[wiki/tech-patterns/nodejs-vanilla-proxy|nodejs-vanilla-proxy]] — Node.js + vanilla JS pattern diff --git a/wiki/connections/_index.md b/wiki/connections/_index.md index 3baeebc..4d1e2bd 100644 --- a/wiki/connections/_index.md +++ b/wiki/connections/_index.md @@ -8,6 +8,11 @@ | [[wiki/connections/oauth-state-mismatch-debugging]] | LibreChat OpenID ↔ MSAL SPA ↔ Azure AD — state mismatch root cause shared across implementations | daily/2026-04-15.md | 2026-04-15 | | [[wiki/connections/graph-api-vs-msal-app-vs-delegated]] | Graph API app-only ↔ MSAL delegated — choosing the right Azure AD auth flow; why delegated 403s on shared mailboxes | daily/2026-04-16.md | 2026-04-16 | | [[wiki/connections/lxc-networking-api-failures]] | LXC ARP cache ↔ Node.js SSL — two orthogonal root causes that produce identical "API Error" symptoms | daily/2026-04-19.md | 2026-04-19 | +| [[wiki/connections/fastapi-azuread-docker-trinity]] | FastAPI ↔ Azure AD ↔ Docker — the three always go together; env vars, CORS, auth middleware wiring | 2026-04-27 | 2026-04-27 | +| [[wiki/connections/ai-always-needs-cost-tracker]] | AI API calls ↔ cost-tracker preflight/record — why retrofitting costs 2 days; per-provider token fields | 2026-04-27 | 2026-04-27 | +| [[wiki/connections/optical-dev-apache-vite-basepath]] | Apache subpath ↔ Vite basePath — two configs that must match; what breaks when they don't | 2026-04-27 | 2026-04-27 | +| [[wiki/connections/gcp-no-websockets]] | GCP LB 30s timeout ↔ REST polling ↔ job table pattern — infrastructure constraint forces code architecture | 2026-04-27 | 2026-04-27 | +| [[wiki/connections/box-api-hotfolder-pattern]] | Box API ↔ hotfolder daemon — always paired; archive pattern prevents double-processing | 2026-04-27 | 2026-04-27 | diff --git a/wiki/connections/ai-always-needs-cost-tracker.md b/wiki/connections/ai-always-needs-cost-tracker.md new file mode 100644 index 0000000..e912186 --- /dev/null +++ b/wiki/connections/ai-always-needs-cost-tracker.md @@ -0,0 +1,83 @@ +--- +title: "Connection: Every AI Call Needs Cost Tracker" +description: "Why ai-cost-tracker must be added at project start, not retrofitted — and how preflight/record connects to every AI provider" +connects: + - "tech-patterns/python-ai-agents" + - "tech-patterns/cost-tracker-integration" + - "concepts/preflight-record-pattern" +created: 2026-04-27 +updated: 2026-04-27 +--- + +# Connection: Every AI Call Needs Cost Tracker + +## The Connection + +AI provider integrations (Gemini, Claude, OpenAI, ElevenLabs) and the Oliver cost tracker are not independent systems — the tracker's preflight/record pattern wraps every AI call. Projects built without cost tracking have no budget enforcement, and retroactively adding it means touching every AI call site. + +## Key Insight + +**Adding cost-tracker at project start costs 30 minutes. Retrofitting costs 2 days.** The preflight() call happens before the AI call, record() happens after. The tracker maintains per-project, per-team budgets and hard-blocks calls when exceeded. Without it, a runaway prompt loop or buggy retry logic can exhaust a monthly budget in minutes. + +## The Pattern + +```python +# Every AI call site looks like this +async def call_ai(prompt: str, user_id: str): + # 1. Estimate and gate + preflight_resp = await cost_tracker.post("/preflight", json={ + "source_app": "my-project", + "model": "gemini-1.5-pro", + "estimated_input_tokens": len(prompt) // 4, + "estimated_output_tokens": 500, + "user_id": user_id + }) + if not preflight_resp.json()["allow"]: + raise BudgetExceeded("Monthly budget reached") + + # 2. Make the actual AI call + response = await gemini.generate(prompt) + + # 3. Record actual usage + await cost_tracker.post("/usage/record", json={ + "source_app": "my-project", + "model": "gemini-1.5-pro", + "input_tokens": response.usage.input_tokens, + "output_tokens": response.usage.output_tokens + }) + return response +``` + +## Per-Provider Gotchas + +| Provider | Token field | Notes | +|----------|------------|-------| +| Gemini | `response.usage_metadata.prompt_token_count` | Available after call | +| OpenAI | `response.usage.prompt_tokens` | `max_completion_tokens` not `max_tokens` for o1/o3 | +| Claude | `response.usage.input_tokens` | Via Anthropic SDK | +| ElevenLabs | `len(text)` characters | No token concept — billed by chars | +| GCP TTS | `len(text)` characters | Same as ElevenLabs | + +See [[wiki/tech-patterns/cost-tracker-providers|cost-tracker-providers]] for full details. + +## Why Retrofitting Is Hard + +1. Every AI call site must be wrapped — in multi-agent systems, that's 4–10 places +2. The async httpx client needs to be shared (not a new client per call) +3. Budget workspace/team/project hierarchy must be set up in the admin UI first +4. Historical data is lost — you can't see what projects cost before integration + +## Projects with Cost Tracker + +ai-cost-tracker itself, Video Accessibility Platform, Enterprise Nexus (in progress). + +## Projects that Need It Added + +Any project making AI API calls without cost tracking: Mod Comms (Gemini), PIMCO Charts (Claude), Sandbox NotebookLM, Oliver AI Bot 2.0, Semblance. + +## Related + +- [[wiki/tech-patterns/cost-tracker-integration|cost-tracker-integration]] — 9-step integration guide +- [[wiki/tech-patterns/cost-tracker-providers|cost-tracker-providers]] — per-provider billing units +- [[wiki/concepts/preflight-record-pattern|preflight-record-pattern]] — the 3-step pattern +- [[wiki/tech-patterns/python-ai-agents|python-ai-agents]] — AI agent patterns diff --git a/wiki/connections/box-api-hotfolder-pattern.md b/wiki/connections/box-api-hotfolder-pattern.md new file mode 100644 index 0000000..5c5fa52 --- /dev/null +++ b/wiki/connections/box-api-hotfolder-pattern.md @@ -0,0 +1,89 @@ +--- +title: "Connection: Box API + Hotfolder Daemon — Always Together" +description: "Box API for asset access and the hotfolder daemon for folder monitoring are paired in all Ford and L'Oréal workflows" +connects: + - "tech-patterns/box-api-integration" + - "architecture/hotfolder-daemon" +created: 2026-04-27 +updated: 2026-04-27 +--- + +# Connection: Box API + Hotfolder Daemon — Always Together + +## The Connection + +Box API integration at Oliver always comes with a hotfolder daemon pattern. The Box API reads/writes assets; the hotfolder daemon watches specific folders for new files and triggers processing. They solve complementary problems: Box API is the transport, hotfolder is the trigger. + +## Key Insight + +**The hotfolder archive pattern prevents double-processing.** Without it, a file would be processed repeatedly until manually removed. The pattern: detect new file → process → move to `/processed/` archive. Box webhooks are an alternative but require a public endpoint and webhook management; polling is simpler and reliable. + +## The Pattern + +``` +Box Folder (/incoming/) + ↓ daemon polls every 30s +Hotfolder Script (box_monitor.py) + ↓ new file detected +Process File (download → transform → upload result) + ↓ success +Move to /incoming/_processed/{date}/ + ↓ or on failure +Move to /incoming/_errors/ +``` + +## Implementation + +```python +# Daemon loop (systemd service) +while True: + items = box_folder.get_items(fields=["id", "name", "created_at"]) + for item in items: + if item.type == "file" and not is_processed(item.id): + process_file(item) + move_to_archive(item, processed_folder) + time.sleep(30) +``` + +The daemon runs as a systemd service on box-cli-01 (CentOS 7): +```ini +[Service] +ExecStart=/usr/bin/python3 /opt/ford-qc/box_monitor.py +Restart=always +RestartSec=10 +``` + +## Box API Auth + +Two modes used across Oliver projects: + +| Mode | Used for | Credential | +|------|---------|-----------| +| Service account (JWT) | Server daemons (hotfolder, scheduled jobs) | `config.json` from Box Developer Console | +| OAuth 2.0 | User-facing tools (Ferrero AC Booking) | Client ID + Secret + redirect URI | + +For hotfolder daemons: always use service account (JWT). No user interaction, runs 24/7. + +## Where This Pattern Is Used + +| Project | Client | What it does | +|---------|--------|-------------| +| Ford QC System | Ford | Watch Box → download proofs → AI quality check → write result | +| Ford SFTP Transfer | Ford | Watch Box → SFTP push to Ford servers | +| Ferrero AC Booking | Ferrero | "Send to OMG" button → upload CSV to Box folder | +| L'Oréal Global Kickoff | L'Oréal | Box asset management for kickoff materials | + +## Gotchas + +- **`_processed/` must be excluded from monitoring** — otherwise the daemon reprocesses archived files +- **Box rate limits:** 10 API calls/second per app. For large folders, add `time.sleep(0.1)` between items +- **box-cli-01 is CentOS 7 (EOL):** No Docker, Python 3.6 default — use `/usr/bin/python3` path explicitly +- **NFS mount at `/mnt/nfs`:** box-cli-01 has 1TB NFS for large asset storage; processed files go here, not Box + +## Related + +- [[wiki/tech-patterns/box-api-integration|box-api-integration]] — Box API patterns and auth +- [[wiki/architecture/hotfolder-daemon|hotfolder-daemon]] — systemd daemon pattern +- [[wiki/client-knowledge/ford|ford]] — Ford QC + SFTP projects +- [[wiki/client-knowledge/loreal|loreal]] — L'Oréal Box workflows +- [[wiki/infrastructure/server-box-cli|server-box-cli]] — box-cli-01 server details diff --git a/wiki/connections/fastapi-azuread-docker-trinity.md b/wiki/connections/fastapi-azuread-docker-trinity.md new file mode 100644 index 0000000..c1f106f --- /dev/null +++ b/wiki/connections/fastapi-azuread-docker-trinity.md @@ -0,0 +1,85 @@ +--- +title: "Connection: FastAPI + Azure AD + Docker — The Oliver Trinity" +description: "These three always appear together in Oliver internal tools — how they wire up and where each touches the other" +connects: + - "tech-patterns/fastapi-python-docker" + - "tech-patterns/azure-ad-msal-auth" + - "architecture/docker-compose-fullstack" +created: 2026-04-27 +updated: 2026-04-27 +--- + +# Connection: FastAPI + Azure AD + Docker — The Oliver Trinity + +## The Connection + +FastAPI, Azure AD MSAL, and Docker Compose appear together in almost every Oliver internal tool. They're designed independently but have specific integration points that only become clear across multiple projects. + +## Key Insight + +**The auth middleware must run inside the container, but credentials must come from outside it.** This creates a specific pattern: Azure AD env vars flow in through Docker Compose env_file, the MSAL JWKS validation runs in FastAPI middleware, and the Docker healthcheck must not hit authenticated endpoints (it has no token). + +The three systems interact at exactly three points: +1. **JWT validation**: FastAPI reads `AZURE_TENANT_ID` + `AZURE_CLIENT_ID` from env → fetches JWKS from Azure → validates tokens from MSAL.js frontend +2. **CORS**: FastAPI CORS origins must include the exact frontend origin (no trailing slash) — when running in Docker, the origin is the host's port/domain, not the container's internal address +3. **Local dev bypass**: `DISABLE_AUTH=true` skips Azure AD entirely in dev — this env var must be in the Docker service's env_file, not just the host shell + +## The Wiring + +``` +Browser (MSAL.js PKCE) + ↓ acquireTokenSilent() → Azure AD → access_token (JWT) + ↓ Authorization: Bearer {token} +Apache/nginx + ↓ passes all headers intact (do NOT strip Authorization) +FastAPI middleware (HTTPBearer) + ↓ fetches JWKS from https://login.microsoftonline.com/{tenant}/.well-known/openid-configuration + ↓ validates token signature + audience + expiry + ↓ injects user claims into request.state.user +Route handlers + ↓ read request.state.user.preferred_username / roles +``` + +## Environment Variable Pattern + +```yaml +# docker-compose.yml — the right way +services: + api: + env_file: ./backend/.env # contains AZURE_* + DISABLE_AUTH + + frontend: + environment: + - VITE_AZURE_CLIENT_ID=${AZURE_CLIENT_ID} + - VITE_AZURE_TENANT_ID=${AZURE_TENANT_ID} +``` + +```env +# backend/.env +AZURE_TENANT_ID=xxx +AZURE_CLIENT_ID=xxx +AZURE_CLIENT_SECRET=xxx # only if app-only calls needed +DISABLE_AUTH=true # remove in production +``` + +## When DISABLE_AUTH Breaks + +Teams sometimes forget to remove `DISABLE_AUTH=true` on the server. Symptoms: authenticated routes return 200 to anyone with no token. Add a startup check: + +```python +import os +if os.getenv("DISABLE_AUTH", "false").lower() == "true": + import logging + logging.warning("⚠️ AUTH IS DISABLED — do not use in production") +``` + +## Projects Where This Trinity Appears + +GMAL, Mod Comms, Video Accessibility, Semblance, Enterprise Nexus, Barclays Banner Builder, BAIC Dashboard. + +## Related + +- [[wiki/tech-patterns/fastapi-python-docker|fastapi-python-docker]] +- [[wiki/tech-patterns/azure-ad-msal-auth|azure-ad-msal-auth]] +- [[wiki/architecture/docker-compose-fullstack|docker-compose-fullstack]] +- [[wiki/architecture/new-project-checklist|new-project-checklist]] diff --git a/wiki/connections/gcp-no-websockets.md b/wiki/connections/gcp-no-websockets.md new file mode 100644 index 0000000..849cbed --- /dev/null +++ b/wiki/connections/gcp-no-websockets.md @@ -0,0 +1,93 @@ +--- +title: "Connection: GCP LB Timeout → REST Polling → Job Table Pattern" +description: "The chain from GCP infrastructure constraint to application architecture — why the 30s LB timeout forces a specific code pattern" +connects: + - "architecture/gcp-deployment-lb-timeout" + - "tech-patterns/redis-celery-worker-queue" + - "tech-patterns/fastapi-python-docker" +created: 2026-04-27 +updated: 2026-04-27 +--- + +# Connection: GCP LB Timeout → REST Polling → Job Table Pattern + +## The Connection + +GCP load balancers terminate TCP connections after 30 seconds of inactivity. This single infrastructure constraint forces a specific application architecture pattern: job tables, HTTP polling, and eventually Celery for heavy workloads. The constraint propagates from infrastructure all the way to frontend code. + +## Key Insight + +**The 30s timeout isn't a WebSocket-specific problem — it affects any long-running synchronous request.** A `/api/analyze` endpoint that takes 45 seconds will also be killed. The fix isn't just replacing WebSockets; it's making the entire async workflow non-blocking within 30 seconds. + +## The Constraint Chain + +``` +GCP Load Balancer: 30s TCP inactivity timeout + ↓ forces +No WebSockets (connection held open → killed at 30s) + ↓ forces +No long synchronous API calls > 30s + ↓ forces +Job table pattern: POST creates job → GET polls status + ↓ for heavy parallelism: +Redis + Celery for worker queue management +``` + +## Minimum Implementation (no Celery needed) + +```python +# POST /api/jobs → returns immediately with job_id +@router.post("/jobs") +async def create_job(request: JobRequest, db: AsyncSession = Depends(get_db)): + job = Job(status="pending", ...) + db.add(job) + await db.commit() + asyncio.create_task(run_job(job.id)) # fire and forget + return {"job_id": job.id} + +# GET /api/jobs/{id} → returns current status +@router.get("/jobs/{job_id}") +async def get_job(job_id: str, db: AsyncSession = Depends(get_db)): + job = await db.get(Job, job_id) + return {"status": job.status, "result": job.result} +``` + +```ts +// Frontend: poll every 2s until done +const poll = async (jobId: string) => { + while (true) { + const r = await fetch(`${API}/api/jobs/${jobId}`); + const { status, result } = await r.json(); + if (status === "done") return result; + if (status === "error") throw new Error(result); + await new Promise(r => setTimeout(r, 2000)); + } +}; +``` + +## When to Add Celery + +The simple `asyncio.create_task()` pattern breaks when: +- Multiple AI agents run in parallel and saturate event loop +- Tasks need retry logic on failure +- Need to see queue depth / worker utilization +- Jobs run for >5 minutes + +For those cases: see [[wiki/tech-patterns/redis-celery-worker-queue|redis-celery-worker-queue]]. + +## Applies Also to optical-dev + +Apache + corporate LB timeout is 30–60s. Same pattern required on optical-dev, not just GCP. + +## Projects Using This Pattern + +- Mod Comms: switched from WebSocket after production incident (2026-03-18) +- Enterprise Nexus: HTTP polling from day 1 +- Video Accessibility: Celery for heavy FFmpeg + AI pipeline + +## Related + +- [[wiki/architecture/gcp-deployment-lb-timeout|gcp-deployment-lb-timeout]] — full implementation +- [[wiki/tech-patterns/redis-celery-worker-queue|redis-celery-worker-queue]] — when Celery is needed +- [[wiki/architecture/adr-log|adr-log]] — ADR-001 documents this decision +- [[wiki/client-knowledge/barclays|barclays]] — Mod Comms incident diff --git a/wiki/connections/optical-dev-apache-vite-basepath.md b/wiki/connections/optical-dev-apache-vite-basepath.md new file mode 100644 index 0000000..879e90f --- /dev/null +++ b/wiki/connections/optical-dev-apache-vite-basepath.md @@ -0,0 +1,102 @@ +--- +title: "Connection: optical-dev Apache Subpath + Vite basePath" +description: "The two-system pairing that makes React SPAs work at /project-name/ — Apache and Vite must be configured together or nothing works" +connects: + - "architecture/optical-dev-server-deploy" + - "tech-patterns/react-vite-typescript" + - "concepts/nextjs-basepath-auth-redirects" +created: 2026-04-27 +updated: 2026-04-27 +--- + +# Connection: optical-dev Apache Subpath + Vite basePath + +## The Connection + +React SPAs deployed to optical-dev live at a URL subpath (`/barclays-banner-builder/`) not the domain root. This requires Apache and Vite to both be configured with the same base path — getting one wrong without the other causes different but equally confusing failures. + +## Key Insight + +**Apache routes to the SPA. Vite tells the SPA where it lives.** These are two independent configurations that must match. A mismatch produces symptoms that look like routing bugs or blank pages, not configuration errors. + +## The Two Configurations That Must Match + +### Apache fragment (`/opt/project/deploy/apache-project.conf`) +```apache +# API proxy (MUST come before Alias) +ProxyPass /project-name/api/ http://127.0.0.1:8010/api/ +ProxyPassReverse /project-name/api/ http://127.0.0.1:8010/api/ + +# SPA static files +Alias /project-name /var/www/html/project-name + + RewriteEngine On + RewriteBase /project-name/ + RewriteCond %{REQUEST_FILENAME} !-f + RewriteCond %{REQUEST_FILENAME} !-d + RewriteRule ^ index.html [L] + +``` + +### Vite + React (`vite.config.ts` + `main.tsx`) +```ts +// vite.config.ts +export default defineConfig({ + base: process.env.VITE_BASE_PATH ?? "/", +}) + +// main.tsx + + +// api calls — must include base path +const API = import.meta.env.VITE_BASE_PATH ?? ""; +fetch(`${API}/api/endpoint`) +``` + +### Build command +```bash +VITE_BASE_PATH=/project-name npm run build +``` + +## What Breaks When Misconfigured + +| Misconfiguration | Symptom | +|-----------------|---------| +| Apache has subpath, Vite doesn't | App loads at `/project-name/` but JS assets 404 (wrong asset paths) | +| Vite has basePath, Apache doesn't | 404 on all subpath URLs | +| API calls don't include basePath | API calls work locally (no Apache), fail in prod | +| ProxyPass AFTER Alias | Apache serves `index.html` for all `/api/` calls → API returns HTML → JSON parse error | +| Azure AD redirect URI doesn't include basePath | Auth callback 404 after login | + +## Auth + basePath + +Azure AD redirect URI must include the subpath. See [[wiki/concepts/nextjs-basepath-auth-redirects|nextjs-basepath-auth-redirects]]: + +``` +✅ https://optical-dev.oliver.solutions/project-name/auth/callback +❌ https://optical-dev.oliver.solutions/auth/callback +``` + +In MSAL config: +```ts +auth: { + redirectUri: `${import.meta.env.VITE_BASE_PATH}/auth/callback` +} +``` + +## Checklist When Deploying a New SPA to optical-dev + +- [ ] Apache fragment: ProxyPass before Alias, trailing slashes on ProxyPass +- [ ] Vite config: `base: VITE_BASE_PATH` +- [ ] React Router: `basename={VITE_BASE_PATH}` +- [ ] All `fetch()` calls: prefix with `VITE_BASE_PATH` +- [ ] Build command: `VITE_BASE_PATH=/project npm run build` +- [ ] Azure AD redirect URI: includes subpath in Portal + env var +- [ ] Apache configtest passes before reload + +## Related + +- [[wiki/architecture/optical-dev-server-deploy|optical-dev-server-deploy]] — full Apache config reference +- [[wiki/tech-patterns/react-vite-typescript|react-vite-typescript]] — Vite patterns +- [[wiki/concepts/nextjs-basepath-auth-redirects|nextjs-basepath-auth-redirects]] — basePath + auth +- [[wiki/architecture/troubleshooting-playbooks|troubleshooting-playbooks]] — when it breaks