Vadym Samoilenko 3e9ccafad2 Add LLM usage tracking infrastructure (Phases A-C)

- Model renames: gpt-5.2 → gpt-5.4-2026-03-05, gemini-3-pro-preview → gemini-3.1-pro-preview; retire gpt-4.1 via alias fallback
- New: llm_usage_context.py (ContextVar-based attribution), model_pricing.py (tiered pricing + 60s cache), usage_event.py (append-only telemetry), quota.py (user/FG quota enforcement with 80% warning)
- Wire _record_usage into all 3 LLM methods; set_llm_context at every service entry point
- Fix admin_required decorator (was sync, never awaited User.find_by_id); add active_required and with_user_context decorators
- Inject user_id into ContextVar from JWT on every authenticated request
- Add DB indexes for usage_events, model_pricing, users collections
- Seed script for model pricing (gpt-5.4 single-tier, gemini-3.1 two-tier 200k threshold)
- Fix parse_json_response NameError (logger undefined at module level)
- 70 passing tests: conftest.py with sys.modules stubs, test_usage_infrastructure.py (52 tests), rewrite stale test_llm_service.py (18 tests)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-24 18:08:27 +01:00

5.3 KiB

Executable file

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Commands

Dev Server: npm run dev (port 5173, proxies /api → localhost:5137)
Build: npm run build (use this to verify TypeScript compilation)
Dev Build: npm run build:dev (development mode build)
Lint: npm run lint
Backend: cd backend && python run.py (Hypercorn ASGI on port 5137)

Backend Testing

After modifying any Python files:

source backend/venv/bin/activate
python -c "import app.services.module_name"        # Test specific module
python -c "from app import create_app; create_app()"  # Test app creation

Architecture Overview

ASGI Stack (critical detail)

create_app() returns a socketio.ASGIApp wrapping a Quart app — not the Quart app itself. Accessing app.quart_app gives the inner Quart instance. This distinction matters whenever you write ASGI middleware or access app config directly.

Real-Time Communication

Socket.IO via python-socketio AsyncServer (ASGI mode). The WebSocketContextNew.tsx context manages the client connection. websocket_manager_async.py handles room-based messaging for focus group sessions. The WebSocket manager must call ws_mgr.set_main_loop(asyncio.get_running_loop()) at startup so that cross-thread emits from the AI Runner land on the right loop.

VITE_ENABLE_WEBSOCKET is hardcoded true in dev and false in production builds via vite.config.ts — it is not controlled by .env.

AI Runner + Threading

ai_runner_service.py is a singleton that owns a dedicated OS thread with a single asyncio event loop. All autonomous AI conversations run in this thread. This solves Motor (AsyncIOMotorClient) event-loop affinity: Motor clients in the AI runner are bound to that loop, while regular API routes use synchronous PyMongo. Never share Motor clients between the two contexts.

Autonomous Conversation Pipeline

ai_runner_service.py — spawns coroutines on the dedicated thread's event loop
autonomous_conversation_controller.py — orchestrates the full session
conversation_decision_service.py — picks the next speaker
conversation_context_service.py — maintains history/state
conversation_state_manager.py — in-memory state across turns

Task Manager

task_manager.py is a singleton tracking cancellable asyncio tasks (persona generation, discussion guides, etc.). Tasks are exposed via /api/tasks routes. A background sweeper cleans up completed/expired tasks. Frontend polling is handled by useTaskPolling.ts.

LLM Integration

llm_service.py creates fresh clients per call (avoids event-loop mismatch in ASGI). Default model: Google Gemini via google-genai. Alternative: OpenAI (AsyncOpenAI). Both require env vars GEMINI_API_KEY and OPENAI_API_KEY — startup fails if missing. Prompts are markdown templates in /backend/prompts/ loaded by prompt_loader.py.

Code Style

TypeScript with strictNullChecks: false
Functional components with hooks; local state via hooks, shared state via context/props
@/ alias maps to src/
URL construction: always use ${import.meta.env.BASE_URL}asset.png — production base is /semblance/
Error handling: try/catch + sonner toast for user feedback

File Organization

backend/
  app/
    routes/          # Blueprints: auth, personas, focus-groups, ai-personas, focus-group-ai, folders, tasks
    services/        # Business logic: llm_service, ai_runner_service, task_manager, autonomous_*, conversation_*
    models/          # Data models: User, FocusGroup, Persona, Folder
    auth/            # Auth utilities (JWT helpers)
    prompts/         # LLM prompt markdown templates
    websocket_manager_async.py  # Room-based async WebSocket manager
    extensions.py    # socketio.AsyncServer singleton

src/
  pages/             # Route-level components (Dashboard, FocusGroups, FocusGroupSession, Login, SyntheticUsers)
  components/
    focus-group-session/  # Session UI panels (Discussion, Participant, Themes, etc.)
    persona/         # Persona management components
    ui/              # shadcn-ui primitives
  contexts/          # AuthContext, WebSocketContextNew, NavigationContext
  hooks/             # useTaskPolling, useWebSocket, usePersonaStorage, useDiscussionGuideGeneration, etc.
  types/             # TypeScript type definitions

Environment Configuration

Setting	Development	Production
Base path	`/`	`/semblance/`
API base	`/api` (proxied to 5137)	`https://optical-dev.oliver.solutions/semblance_back/api`
WebSocket path	`/socket.io/`	`/semblance_back/socket.io/`
MSAL redirect	`http://localhost:5173/`	`https://optical-dev.oliver.solutions/semblance`

Setup: copy .env.development or .env.production to .env. Backend requires backend/.env with SECRET_KEY, JWT_SECRET_KEY, GEMINI_API_KEY, OPENAI_API_KEY — startup will throw RuntimeError if any are missing or use weak defaults.

Knowledge Wiki

A cross-project knowledge base is maintained automatically from all Claude Code sessions.

Index: /Users/ai_leed/Library/Mobile Documents/iCloud~md~obsidian/Documents/VadymSamoilenko/wiki/index.md
Query: cd ~/.claude/memory-compiler && uv run python scripts/query.py "your question"

5.3 KiB Executable file Raw Blame History