From 92929d3b41d226839e2acd50934547df2e4fe6e8 Mon Sep 17 00:00:00 2001 From: Vadym Samoilenko Date: Mon, 27 Apr 2026 12:29:28 +0100 Subject: [PATCH] vault backup: 2026-04-27 12:29:28 --- wiki/architecture/_index.md | 13 +++---- wiki/architecture/ai-cost-tracker.md | 58 +++++++++++++++++++++------- 2 files changed, 49 insertions(+), 22 deletions(-) diff --git a/wiki/architecture/_index.md b/wiki/architecture/_index.md index 9961765..1e6f7b2 100644 --- a/wiki/architecture/_index.md +++ b/wiki/architecture/_index.md @@ -3,7 +3,7 @@ title: "Architecture Patterns Index" description: "Cross-cutting architectural decisions across Oliver Agency projects" tags: [index, architecture] created: 2026-04-15 -updated: 2026-04-15 +updated: 2026-04-27 --- # Architecture Patterns @@ -19,15 +19,14 @@ Cross-cutting architectural decisions that appear in multiple Oliver projects. | [[wiki/architecture/gcp-deployment-lb-timeout\|gcp-deployment-lb-timeout]] | GCP 30s LB timeout — WebSocket → HTTP polling fix | Mod Comms, Semblance | | [[wiki/architecture/rag-architecture\|rag-architecture]] | RAG: Firecrawl → AI structuring → Qdrant → LLM synthesis | Enterprise Nexus, Sandbox NotebookLM | | [[wiki/architecture/hotfolder-daemon\|hotfolder-daemon]] | Box folder monitoring daemon with systemd | Ford QC, Ford SFTP | +| [[wiki/architecture/optical-dev-server-deploy\|optical-dev-server-deploy]] | optical-dev Apache subpath pattern: single vhost, Include conf, port table, deploy script | All Oliver projects | +| [[wiki/architecture/ai-cost-tracker\|ai-cost-tracker]] | Shared AI cost tracker: Docker Compose, Workspace→Team→Project, preflight/record HTTP API, LiteLLM pricing, hard budget limits | All Oliver projects | ## Key Architectural Decisions -1. **Docker Compose** — default deployment for all multi-service projects +1. **Docker Compose** — default deployment for all multi-service projects on optical-dev 2. **HTTP polling over WebSocket** — mandatory on GCP (30s LB timeout) 3. **AI pre-structuring before RAG indexing** — improves retrieval quality 4. **Hotfolder + archive pattern** — prevents reprocessing in Box automations -5. **DEV_AUTH_BYPASS** — skip Azure AD in local dev, always use real auth in production - - -| [[wiki/architecture/optical-dev-server-deploy\|optical-dev-server-deploy]] | optical-dev GCP server: single-vhost Apache, Include pattern, port table, deploy script cache | Barclays Banner Builder, all Oliver projects | -| [[wiki/architecture/ai-cost-tracker\|ai-cost-tracker]] | Shared AI cost tracking service: Workspace→Team→Project, LiteLLM pricing, preflight+record SDK, hard limits | All Oliver projects | \ No newline at end of file +5. **DEV_AUTH_BYPASS / dev login** — skip Azure AD in local/dev environment, real auth in production +6. **Cost tracking as cross-cutting concern** — every AI call preflight+record via ai-cost-tracker diff --git a/wiki/architecture/ai-cost-tracker.md b/wiki/architecture/ai-cost-tracker.md index 7ed555c..f3d83c0 100644 --- a/wiki/architecture/ai-cost-tracker.md +++ b/wiki/architecture/ai-cost-tracker.md @@ -9,6 +9,17 @@ updated: 2026-04-27 Centralised **shared service** that tracks AI API spend across all Oliver projects. Every project that calls Gemini, ElevenLabs, Google Cloud TTS, or other paid AI APIs sends usage events here. +## Live URLs + +| Environment | URL | +|---|---| +| Dev (optical-dev) | `https://optical-dev.oliver.solutions/cost-tracker/` | +| API health | `https://optical-dev.oliver.solutions/cost-tracker/v1/health` | +| API docs | `https://optical-dev.oliver.solutions/cost-tracker/v1/docs` | +| Prod (future) | `https://cost.oliver.agency/` | + +**Repo:** `git@bitbucket.org:zlalani/ai-cost-tracker.git` + ## Why it exists Oliver runs multiple projects (video-accessibility, One2Edit, Box-pipelines …) all consuming paid AI APIs. Without centralised tracking: no visibility into total spend, no per-client cost attribution, no budget enforcement. @@ -23,9 +34,9 @@ Oliver runs multiple projects (video-accessibility, One2Edit, Box-pipelines …) │ AI call sites │ preflt │ FastAPI + MongoDB + Redis + React │ │ (Gemini, ElevenLabs, GCP TTS) │────────►│ POST /v1/preflight │ │ │ │ │ POST /v1/usage/record │ -│ oliver_cost_tracker SDK │ │ POST /v1/users/upsert │ -│ - preflight(estimate) │◄────────│ POST /v1/projects/upsert │ -│ - record(actual) │ │ │ +│ Direct HTTP calls (httpx) │ │ POST /v1/users/upsert │ +│ - POST /v1/preflight │◄────────│ POST /v1/projects/upsert │ +│ - POST /v1/usage/record │ │ │ │ - SQLite outbox + retry │ │ Admin UI (Microsoft SSO) │ └──────────────────────────────────┘ │ Workspaces / Teams / Projects │ │ Pricing + LiteLLM auto-sync │ @@ -38,14 +49,15 @@ Oliver runs multiple projects (video-accessibility, One2Edit, Box-pipelines …) | Decision | Choice | Why | |---|---|---| -| Deployment | Separate repo + separate server | Clean isolation, independent scaling | +| Deployment | Separate repo + Docker Compose on optical-dev | Clean isolation, independent scaling | | Org hierarchy | Workspace → Team → Project | Matches Oliver agency structure | | User ownership | Each project owns users; lazy mirror in shared | No SSO migration needed | | Pricing | LiteLLM auto-sync + YAML (non-LLM) + admin override | Auto-updated for LLMs, manual for chars | | Transport | Sync HTTP + SQLite outbox fallback | Never breaks the AI pipeline | | Budget enforcement | Hard limits via preflight check | `allow=false` before call is made | -| Auth (projects) | API key per project | Simple, revocable, auditable | -| Auth (admins) | Microsoft SSO | Consistent with all Oliver projects | +| Auth (projects) | API key per project (`X-API-Key` header) | Simple, revocable, auditable | +| Auth (admins) | Microsoft SSO (+ dev login for testing) | Consistent with all Oliver projects | +| Integration | Direct HTTP calls from client projects | No SDK package to maintain/publish | ## Org hierarchy @@ -60,12 +72,31 @@ Users live in each project (video-accessibility, etc.) and are **lazily mirrored ## Tech stack -Mirrors video-accessibility for team familiarity: -- Backend: **FastAPI + MongoDB Atlas + Redis + Celery** -- Frontend: **React 18 + Vite (TypeScript)** -- Auth admin: **Microsoft AAD (MSAL)** -- Charts: **recharts** -- Tables/pivot: **@tanstack/react-table** +- Backend: **FastAPI + MongoDB 7 + Redis 7 + Celery** (worker + beat) +- Frontend: **React 18 + Vite + TypeScript** — nginx inside Docker +- Auth admin: **Microsoft Azure AD OIDC** + dev login (APP_ENV=dev only) +- Deploy: **Docker Compose** on optical-dev, Apache reverse proxy at `/cost-tracker/` + +## Docker Compose services + +| Service | Port | Role | +|---|---|---| +| `api` | 8003→8001 | FastAPI + uvicorn (2 workers) | +| `frontend` | 5174→80 | React SPA + nginx | +| `mongodb` | internal | MongoDB 7 | +| `redis` | internal | Redis 7 | +| `celery-worker` | — | Async task processing | +| `celery-beat` | — | Scheduled tasks (pricing sync 02:00 UTC) | + +## API endpoints (project-facing, `X-API-Key` auth) + +| Method | Path | Purpose | +|---|---|---| +| `POST` | `/v1/preflight` | Check budget, get `request_id` | +| `POST` | `/v1/usage/record` | Record actual AI call cost | +| `POST` | `/v1/users/upsert` | Create/update user mirror | +| `POST` | `/v1/projects/upsert` | Create/update project mirror | +| `GET` | `/v1/health` | Liveness check | ## Related articles @@ -73,6 +104,3 @@ Mirrors video-accessibility for team familiarity: - [[wiki/tech-patterns/cost-tracker-pricing-sources|cost-tracker-pricing-sources]] — how pricing is maintained - [[wiki/tech-patterns/cost-tracker-providers|cost-tracker-providers]] — billing units per AI provider - [[wiki/concepts/preflight-record-pattern|preflight-record-pattern]] — the core usage-tracking pattern -- [[wiki/concepts/lazy-user-mirror|lazy-user-mirror]] — how user sync works -- [[wiki/concepts/sync-with-outbox|sync-with-outbox]] — resilient HTTP calls with SQLite fallback -- [[wiki/projects-overview/ai-cost-tracker|ai-cost-tracker (project card)]] — project registry card