# Brief Extractor - Comprehensive Technical Documentation v2.0 **Document Version:** 2.0 **Last Updated:** October 7, 2025 **Application Version:** 1.0.0 **Author:** Technical Documentation Team --- ## Table of Contents 1. [Executive Summary](#executive-summary) 2. [System Architecture Overview](#system-architecture-overview) 3. [Backend Architecture](#backend-architecture) 4. [Frontend Architecture](#frontend-architecture) 5. [Data Flow and Processing Pipeline](#data-flow-and-processing-pipeline) 6. [Authentication and Security](#authentication-and-security) 7. [WebSocket Real-Time Communication](#websocket-real-time-communication) 8. [API Reference](#api-reference) 9. [Data Models and Schemas](#data-models-and-schemas) 10. [Configuration Management](#configuration-management) 11. [Deployment Architecture](#deployment-architecture) 12. [Error Handling and Logging](#error-handling-and-logging) 13. [Performance and Scalability](#performance-and-scalability) 14. [Development Guide](#development-guide) 15. [Troubleshooting Guide](#troubleshooting-guide) --- ## Executive Summary The **Brief Extractor** is an enterprise-grade, multi-tenant document analysis platform that leverages multiple cutting-edge AI models (OpenAI GPT-5, Anthropic Claude, Google Gemini) in parallel to extract structured marketing asset information from unstructured creative briefs and presentations. ### Key Features - **Multi-Model AI Analysis:** Parallel processing using 3+ AI models simultaneously for comprehensive data extraction - **Intelligent Consolidation:** Advanced deduplication and merging of multi-model results - **Real-Time Progress Tracking:** WebSocket-based live updates with provider-specific progress reporting - **Enterprise Authentication:** Microsoft Azure AD (MSAL) SSO integration with PKCE flow - **Multi-Tenant Architecture:** Complete user isolation with per-user job queuing and data segregation - **Scalable Processing:** Asynchronous job queue with configurable concurrency limits - **Production-Ready:** Comprehensive error handling, logging, monitoring, and recovery mechanisms ### Technology Stack **Backend:** - **Framework:** Quart (async Python web framework) - **AI Models:** OpenAI GPT-5, Anthropic Claude Opus 4.1/Sonnet 4, Google Gemini 2.5 Pro - **Document Processing:** LlamaParser cloud service for OCR and extraction - **Authentication:** MSAL (Microsoft Authentication Library) with JWT validation - **Real-Time:** WebSocket with automatic reconnection and health monitoring - **Data Storage:** File-based storage with automatic cleanup and retention policies **Frontend:** - **Framework:** React 18 with TypeScript - **Build Tool:** Vite 5 with HMR and optimized production builds - **State Management:** Zustand for global state, TanStack Query for server state - **UI Framework:** Tailwind CSS with custom design system - **Authentication:** MSAL React with Azure AD integration - **Real-Time:** Native WebSocket client with exponential backoff reconnection --- ## System Architecture Overview ### High-Level Architecture ```mermaid graph TB subgraph Browser["Client Browser (Frontend)"] React["React Application
- Upload UI
- Queue View
- Authentication"] WSClient["WebSocket Client
- Live Updates
- Auto-Reconnect
- Connection Health"] React <--> WSClient end subgraph WebServer["Web Server (Apache/Nginx)"] SSL["SSL/TLS Termination"] WSProxy["WebSocket Upgrade Proxy"] Static["Static File Serving"] end subgraph Backend["Quart Application Server (Backend)"] subgraph API["API Layer"] AuthAPI["Auth API"] JobsAPI["Jobs API"] ConfigAPI["Config API"] end subgraph Queue["Job Queue System"] AsyncQueue["Async Queue"] Workers["Workers (5)"] Semaphore["Semaphore"] end subgraph WS["WebSocket Manager"] Connections["Connections"] Broadcasting["Broadcasting"] UserTargeting["User Targeting"] end subgraph Processing["Job Processing Engine"] Extract["Content Extraction
(LlamaParser)"] Analysis["Parallel Multi-Model
Analysis"] Consolidation["Result Consolidation"] CSV["CSV Generation"] end end subgraph External["External Services"] OpenAI["OpenAI API
(GPT-5)"] Anthropic["Anthropic API
(Claude)"] Google["Google AI API
(Gemini)"] Llama["LlamaCloud API
(Parsing)"] end React -->|"HTTPS/REST API"| WebServer WSClient -->|"WSS (WebSocket)"| WebServer WebServer --> API WebServer --> WS API --> Queue Queue --> Processing Processing --> OpenAI Processing --> Anthropic Processing --> Google Processing --> Llama Processing --> WS ``` ### Component Communication Flow ```mermaid sequenceDiagram actor User participant Frontend participant API as API Endpoint participant JobMgr as Job Manager participant Queue as Job Queue participant Worker as Worker Pool participant LLM as LLM Services participant WS as WebSocket User->>Frontend: Upload File Frontend->>API: POST /api/jobs API->>JobMgr: create_job() JobMgr->>JobMgr: Save file to disk JobMgr->>Queue: Enqueue job_id JobMgr->>WS: Broadcast job.created WS->>Frontend: Job created event API->>Frontend: Job[] response Queue->>Worker: Pull job_id Worker->>Worker: Phase 1: Extract Content Worker->>LLM: LlamaParser API LLM->>Worker: Markdown content Worker->>WS: Progress update (25%) WS->>Frontend: job.progress event Worker->>Worker: Phase 2: Parallel Analysis par Parallel Execution Worker->>LLM: OpenAI GPT-5 Worker->>LLM: Anthropic Claude Worker->>LLM: Google Gemini end LLM->>Worker: All results Worker->>WS: Progress update (75%) WS->>Frontend: job.progress event Worker->>Worker: Phase 3: Consolidation Worker->>LLM: Consolidation model LLM->>Worker: Merged results Worker->>WS: Progress update (90%) Worker->>Worker: Phase 4: Generate CSV Worker->>WS: job.completed WS->>Frontend: Completion event Frontend->>User: Show download button ``` ### Deployment Architecture **Production Deployment:** - **Frontend:** `https://ai-sandbox.oliver.solutions/brief-extractor/` - **Backend API:** `https://ai-sandbox.oliver.solutions/brief-extractor-back/api` - **Backend WebSocket:** `wss://ai-sandbox.oliver.solutions/brief-extractor-back/ws` **Development Environment:** - **Frontend:** `http://localhost:3000` - **Backend API:** `http://localhost:8000/api` - **Backend WebSocket:** `ws://localhost:8000/ws` --- ## Backend Architecture ### Technology Stack **Core Framework:** Quart 0.19+ (async Python web framework based on Flask API) - Chosen for native async/await support required for parallel LLM calls - ASGI-based for WebSocket support - Compatible with Hypercorn ASGI server **Key Dependencies:** - `quart` - Async web framework - `quart-cors` - CORS middleware for cross-origin requests - `openai>=1.0.0` - OpenAI GPT-5 client with responses API - `anthropic>=0.67.0` - Anthropic Claude client with async support - `google-genai[aiohttp]>=0.4.0` - Google Gemini client with aiohttp - `llama-cloud-services>=0.6.62` - LlamaParser document extraction - `msal>=1.24.0` - Microsoft Authentication Library - `PyJWT>=2.8.0` - JWT token validation - `structlog` - Structured logging for production environments - `python-dotenv` - Environment variable management - `pydantic>=2.0.0` - Data validation and schema definition ### Directory Structure ```mermaid graph LR subgraph Server["server/ - Backend Application"] App["app.py
Main application"] Config["config_runtime.py
Runtime config"] subgraph API["api/ - REST Endpoints"] AuthAPI["auth.py
/api/auth/*"] ConfigAPI["config.py
/api/config/*"] JobsAPI["jobs.py
/api/jobs/*"] end subgraph Auth["auth/ - Authentication"] MSAL["msal_auth.py
JWT validation"] Middleware["middleware.py
Decorators"] end subgraph Jobs["jobs/ - Job Management"] Models["models.py
Data models"] Manager["manager.py
Singleton registry"] Storage["storage.py
File operations"] end subgraph Runners["runners/ - Execution"] JobRunner["job_runner.py
Workers"] EnhancedAnalyzer["enhanced_analyzer.py
Progress hooks"] Progress["progress.py
Reporting"] end subgraph WS["ws/ - WebSocket"] WSManager["manager.py
Connections"] end end subgraph Core["core/ - Processing Engine"] CoreConfig["config.py
Model config"] ProcessBrief["process_brief_enhanced.py
DocumentAnalyzer"] Consolidation["consolidation_processor.py
Result merging"] subgraph LLMService["llm_service/ - Providers"] Base["base_provider.py
Abstract interface"] OpenAI["openai_provider.py
GPT-5"] Anthropic["anthropic_provider.py
Claude"] GoogleProv["google_provider.py
Gemini"] ProvManager["provider_manager.py
Parallel coordinator"] end end App --> API App --> Auth App --> Jobs App --> Runners App --> WS Runners --> Core style Server fill:#e3f2fd style Core fill:#e8f5e9 ``` ### Core Components #### 1. Application Factory (`server/app.py`) **Purpose:** Creates and configures the Quart application with all routes, middleware, and lifecycle hooks. **Key Responsibilities:** - Register API blueprints (`auth_bp`, `config_bp`, `jobs_bp`) - Configure CORS for cross-origin requests - Initialize WebSocket manager and job queue - Set up application lifecycle (`before_serving`, `after_serving`) - Start/stop background worker tasks - Define health check and WebSocket endpoints - Configure error handlers (400, 401, 403, 404, 413, 500) **Lifecycle Management:** ```python @app.before_serving async def startup(): # Start WebSocket background tasks (ping, cleanup) await ws_manager.start_background_tasks() # Start job processing workers (configurable count) background_workers = await start_background_workers( job_manager, ws_manager, num_workers=server_config.MAX_CONCURRENT_JOBS ) # Schedule periodic cleanup (hourly) cleanup_task = asyncio.create_task(periodic_cleanup(job_manager)) @app.after_serving async def shutdown(): # Stop all background workers gracefully await stop_background_workers(background_workers) await ws_manager.stop_background_tasks() ``` **Critical Configuration:** - `MAX_CONTENT_LENGTH`: File upload size limit (200MB default) - `SESSION_SECRET`: Used for secure cookie signing - `SECURE_COOKIES`, `HTTPS_ONLY`: Security flags for production #### 2. Job Manager (`server/jobs/manager.py`) **Pattern:** Thread-safe Singleton **Purpose:** Central registry and queue for all processing jobs **Architecture:** ```mermaid classDiagram class JobManager { <> -_instance: JobManager -jobs: Dict[str, Job] -queue: asyncio.Queue -processing_semaphore: Semaphore -storage: StorageManager -_lock: asyncio.Lock +create_job(file, user_id) Job +get_job(job_id) Job +get_user_jobs(user_id) List~Job~ +delete_job(job_id) bool +cleanup_expired_jobs() int +serialize_all() List~Dict~ +get_instance()$ JobManager } class Job { +id: str +user_id: str +phase: JobPhase +progress_pct: int +provider_updates: Dict +logs: List~LogEntry~ +model_config: ModelConfiguration +update_progress(phase, pct, label) +mark_completed(url, summary, path) +mark_failed(error) +to_dict() Dict } class StorageManager { +upload_dir: Path +output_dir: Path +save_uploaded_file(data, filename, job_id) str +validate_file(filename, size) tuple +cleanup_job_files(upload, output) +cleanup_expired_files() int } JobManager --> Job : manages JobManager --> StorageManager : uses ``` **Key Operations:** **Job Creation Flow:** 1. Validate file (extension, size, name) 2. Create Job instance with unique UUID 3. Save file to disk via `StorageManager` 4. Add job to in-memory registry 5. Enqueue job ID for processing 6. Return job to API endpoint **User Isolation:** - Each job tagged with `user_id` from authenticated token - `get_user_jobs()` filters by `user_id` - Users can only see/access their own jobs - WebSocket broadcasts filtered by user **Concurrency Control:** ```python # Semaphore limits concurrent processing processing_semaphore = asyncio.Semaphore(MAX_CONCURRENT_JOBS) # Worker acquires semaphore before processing async with job_manager.processing_semaphore: await run_job(job, ws_manager) ``` **Cleanup and Retention:** - Periodic cleanup every hour via `periodic_cleanup()` task - Removes jobs older than `FILE_RETENTION_HOURS` (24h default) - Cleans up orphaned upload/output files - Preserves active and recent jobs #### 3. Storage Manager (`server/jobs/storage.py`) **Purpose:** Safe file operations with validation and cleanup **Directory Structure:** ``` server/data/ ├── uploads/ # Uploaded documents (temporary) │ └── {job_id}_{sanitized_filename}.{ext} └── outputs/ # Generated CSV files └── {sanitized_basename}-{timestamp}.csv ``` **File Operations:** - **Validation:** Extension whitelist, size limits, filename sanitization - **Safe Naming:** Job ID prefix to prevent collisions - **Async I/O:** Uses `run_in_executor()` to avoid blocking event loop - **Automatic Cleanup:** Removes files older than retention period **Security Features:** - Filename sanitization removes special characters - Length limits prevent path traversal - Extension whitelist: `.pdf`, `.pptx`, `.docx`, `.xlsx`, `.ppt`, `.doc`, `.xls` - No execution of uploaded files #### 4. Job Processing Pipeline (`server/runners/job_runner.py`) **Architecture:** Background worker pool processing jobs from async queue **Worker Pool:** ```python # Configurable number of workers (default: 5) workers = [] for i in range(num_workers): worker = asyncio.create_task( process_job_queue(job_manager, ws_manager), name=f"job-worker-{i}" ) workers.append(worker) ``` **Job Execution Flow:** ``` 1. Worker pulls job_id from queue (blocking until available) 2. Acquire processing semaphore (concurrency limit) 3. Create ProgressReporter for WebSocket updates 4. Execute run_job(job, ws_manager) ├─ Phase 1: Extract content (LlamaParser) ├─ Phase 2: Parallel multi-model analysis ├─ Phase 3: Consolidate results ├─ Phase 4: Generate CSV └─ Phase 5: Mark completed/failed 5. Release semaphore 6. Mark queue task as done ``` **Progress Reporting:** - Each phase reports progress percentage (0-100) - Provider-specific updates (started, success, error, tokens, cost) - Real-time log streaming to WebSocket clients - Automatic error capture and reporting #### 5. LLM Service Layer (`core/llm_service/`) **Design Pattern:** Provider abstraction with async parallel execution **Provider Hierarchy:** ```mermaid classDiagram class BaseLLMProvider { <> +api_key: str +model_name: str +generate_response(messages, schema)* LLMResponse +validate_config()* bool +estimate_cost(input, output, cached)* float +get_max_tokens()* int +prepare_messages(system, user) List } class OpenAIProvider { +reasoning_effort: str +timeout: int +client: AsyncOpenAI +generate_response() LLMResponse +set_reasoning_effort(effort) -_create_pydantic_model(schema) -_save_debug_response() } class AnthropicProvider { +thinking_budget: int +temperature: float +client: AsyncAnthropic +generate_response() LLMResponse -_two_call_approach() -_convert_to_tool_schema() } class GoogleProvider { +thinking_budget: int +temperature: float +client: genai.Client +generate_response() LLMResponse -_convert_to_gemini_schema() } BaseLLMProvider <|-- OpenAIProvider BaseLLMProvider <|-- AnthropicProvider BaseLLMProvider <|-- GoogleProvider ``` **Common Interface:** ```python async def generate_response( messages: List[Dict[str, str]], schema: Optional[Dict[str, Any]] = None, **kwargs ) -> LLMResponse ``` **Provider-Specific Features:** **OpenAI (`openai_provider.py`):** - Uses `client.responses.parse()` API for structured output - Configurable reasoning effort: `high`, `medium`, `low`, `minimal` - Native Pydantic model support via `text_format` parameter - Automatic retry with exponential backoff (max_retries: 2) - Timeout: 3600 seconds (1 hour) for long documents - Two-stage validation: check `output_parsed`, fallback to `choices[0].message.content` **Anthropic (`anthropic_provider.py`):** - Two-call approach due to thinking mode incompatibility with structured output: 1. **Call A:** Extended thinking with analysis (no forced tools) 2. **Call B:** Structured JSON formatting (no thinking) - Thinking budget: 12,000 tokens (configurable) - Temperature: 1.0 for creative analysis - Max tokens: 32,000 (Claude Sonnet 4), 64,000 (Claude Opus 4.1) - Schema conversion to Anthropic tool format **Google (`google_provider.py`):** - Uses new `google-genai` SDK with `client.aio` async methods - Native thinking support with configurable budget (12,000 tokens) - Schema conversion to Gemini response_schema format - Largest context window: 2M tokens (Gemini 2.5 Pro) - Temperature: 0.7 for balanced creativity/consistency **Provider Manager (`provider_manager.py`):** - Coordinates parallel execution across multiple providers - Uses `asyncio.gather()` for true concurrent API calls - Implements minimum success threshold (default: 1 model must succeed) - Tracks per-provider timing, tokens, and costs - Handles partial failures gracefully **Parallel Execution Example:** ```python # All models process simultaneously responses = await provider_manager.execute_parallel_analysis( model_keys=['openai-gpt5', 'anthropic-sonnet4', 'google-gemini25'], messages=analysis_messages, schema=UNIVERSAL_BASE_DELIVERABLE_SCHEMA, minimum_success_threshold=1 ) # Total time = slowest model, not sum of all models # Example: GPT-5 (110s) + Claude (78s) + Gemini (49s) = 110s total (not 237s) ``` #### 6. Consolidation System (`core/consolidation_processor.py`) **Purpose:** Intelligently merge results from multiple AI models into optimal dataset **Consolidation Strategy:** 1. **Inclusion Bias:** If any model found a legitimate deliverable, include it 2. **Normalization:** Canonicalize titles, categories, specifications before deduplication 3. **Smart Deduplication:** Merge only when core identity matches (not just similar text) 4. **Quality Enhancement:** Combine best specifications from all contributing models **Process Flow:** ``` Input: [GPT-5 Result, Claude Result, Gemini Result] │ ├─ Format results as comparison prompt ├─ Load consolidation strategy template └─ Execute with consolidation model (GPT-5 or Claude Opus) │ ├─ Returns: Consolidated base deliverables ├─ Validation: Ensure 'assets' key present └─ Expansion: Generate individual assets from multipliers ``` **Multiplier Expansion:** - Base deliverables have multiplier arrays (e.g., 5 sizes × 3 markets = 15 assets) - Uses `itertools.product()` for Cartesian product expansion - Validates: `technical_specifications × language_country_market ≈ quantity` #### 7. Authentication System (`server/auth/`) **Architecture:** MSAL-based SSO with development mode bypass **Components:** **MSALAuthenticator (`msal_auth.py`):** - Validates JWT tokens from Microsoft Azure AD - Supports PKCE flow (Public Client - no client secret required) - Extracts user claims: `oid`, `preferred_username`, `name`, `roles` - Token expiration checking via `exp` claim - Audience validation (accepts Microsoft Graph audience) **Middleware (`middleware.py`):** - `@dev_mode_bypass`: Creates mock user in development, validates in production - `@auth_required`: Strict authentication enforcement - `@optional_auth`: Extracts user if present but doesn't require - `get_user_id()`: Safely extracts user ID from request context **Token Flow:** ``` 1. Frontend obtains token via MSAL.js redirect flow 2. Token stored in localStorage 3. Frontend sends token in Authorization header: "Bearer " 4. Backend middleware extracts and validates token 5. User info stored in request context (g.current_user) 6. Endpoints access user via get_user_id() ``` **Development Mode:** - `DEV_MODE=true`: Bypasses MSAL, creates mock user - `DEV_MODE=false`: Requires valid Microsoft account authentication - Never use DEV_MODE in production! #### 8. WebSocket Manager (`server/ws/manager.py`) **Pattern:** Singleton with async lock for thread safety **Architecture:** ```python class WebSocketClient: client_id: UUID # Unique connection identifier user_id: str # Authenticated user ID (for targeting) connected_at: datetime # Connection timestamp last_ping: datetime # Heartbeat tracking websocket: Quart WebSocket object ``` **Connection Lifecycle:** 1. Client connects to `/ws?token=` 2. Backend validates token and extracts user ID 3. Create `WebSocketClient` and register in manager 4. Send `connection.established` acknowledgment 5. Send initial `queue.snapshot` with user's jobs 6. Enter message loop (ping/pong, handle client messages) 7. On disconnect: unregister client **Broadcasting Modes:** - `broadcast_to_all()`: Send to all connected clients - `broadcast_to_user(user_id)`: Send to specific user's connections - `broadcast_job_update(job_id)`: Send job-specific updates (currently broadcasts to all) **Background Tasks:** - **Ping Loop:** Sends ping every 30 seconds to keep connections alive - **Cleanup Loop:** Removes stale connections (no ping for 90+ seconds) **Message Types:** - `connection.established`: Initial handshake - `queue.snapshot`: Full job list on connect - `job.created`: New job added - `job.accepted`: Job entered processing queue - `job.progress`: Phase and progress percentage updates - `job.provider_update`: Per-model status (started, tokens, cost, error) - `job.log`: Real-time log streaming - `job.completed`: Processing finished with results - `job.failed`: Processing error - `job.deleted`: Job removed - `ping`/`pong`: Heartbeat mechanism #### 9. API Endpoints **Authentication API (`/api/auth/*`):** | Endpoint | Method | Purpose | Auth Required | |----------|--------|---------|---------------| | `/api/auth/config` | GET | Get MSAL configuration for frontend | No | | `/api/auth/validate` | POST | Validate access token | No | | `/api/auth/user` | GET | Get current user info | Yes (bypass in dev) | | `/api/auth/logout` | POST | Get logout URL for MSAL | No | **Configuration API (`/api/config/*`):** | Endpoint | Method | Purpose | Auth Required | |----------|--------|---------|---------------| | `/api/config/models` | GET | List available AI models | Yes (bypass in dev) | | `/api/config/defaults` | GET | Get default model configuration | Yes (bypass in dev) | | `/api/config/estimate` | POST | Estimate processing cost | Yes (bypass in dev) | | `/api/config/validate` | POST | Validate model configuration | Yes (bypass in dev) | | `/api/config/system` | GET | Get system information | Yes (bypass in dev) | **Jobs API (`/api/jobs/*`):** | Endpoint | Method | Purpose | Auth Required | |----------|--------|---------|---------------| | `/api/jobs` | POST | Create new jobs (multipart file upload) | Yes (bypass in dev) | | `/api/jobs` | GET | List user's jobs (paginated) | Yes (bypass in dev) | | `/api/jobs/{id}` | GET | Get specific job details | Yes (bypass in dev) | | `/api/jobs/{id}` | DELETE | Delete job and files | Yes (bypass in dev) | | `/api/jobs/{id}/download` | GET | Download CSV result (binary) | Yes (bypass in dev) | | `/api/jobs/{id}/logs` | GET | Get job logs (paginated) | Yes (bypass in dev) | | `/api/jobs/batch-download` | POST | Download multiple CSVs as ZIP | Yes (bypass in dev) | | `/api/jobs/stats` | GET | Get job statistics for user | Yes (bypass in dev) | | `/api/jobs/cleanup` | POST | Clean up expired jobs | Yes (bypass in dev) | **Health Endpoint:** - `/health` (GET): System health with queue stats, WebSocket connections, config info **WebSocket Endpoint:** - `/ws` (WebSocket): Real-time bidirectional communication with token-based auth --- ## Frontend Architecture ### Technology Stack **Core Framework:** React 18.2 with TypeScript 5.2 - Chosen for component-based architecture and type safety - Concurrent rendering features for smooth UI updates - Strict mode enabled for development **Build System:** Vite 5.0 - Fast HMR (Hot Module Replacement) for development - Optimized production builds with code splitting - Environment variable injection at build time **State Management:** - **Zustand 4.4:** Global client state (jobs, connection status) - **TanStack Query 5.8:** Server state management, caching, background refetching - **MSAL React 2.1:** Authentication state **UI Framework:** - **Tailwind CSS 3.3:** Utility-first styling - **Lucide React:** Icon system (tree-shakeable) - **Custom Components:** Reusable UI library **Key Dependencies:** - `react`, `react-dom` - Core React libraries - `@azure/msal-browser`, `@azure/msal-react` - Microsoft authentication - `axios` - HTTP client with interceptors - `zustand` - State management - `@tanstack/react-query` - Server state and caching - `lucide-react` - Icon components - `tailwind-merge`, `clsx` - Dynamic className utilities ### Directory Structure ```mermaid graph LR subgraph Frontend["frontend/src/ - React Application"] Main["main.tsx
Entry point"] App["App.tsx
Root component"] subgraph Components["components/"] subgraph AuthComp["auth/"] AuthProvider["AuthProvider.tsx
MSAL wrapper"] AuthGuard["AuthGuard.tsx
Route protection"] LoginPage["LoginPage.tsx
Login UI"] end subgraph UploadComp["upload/"] UploadPanel["UploadPanel.tsx
File upload"] ModelSelector["ModelSelector.tsx
Model config"] CostEstimator["CostEstimator.tsx
Cost preview"] end subgraph QueueComp["queue/"] QueueView["QueueView.tsx
Job list"] JobCard["JobCard.tsx
Job summary"] JobAccordion["JobAccordion.tsx
Details view"] ProviderChips["ProviderChips.tsx
Status badges"] end subgraph UIComp["ui/"] Button["Button.tsx"] Card["Card.tsx"] ProgressBar["ProgressBar.tsx"] end Dashboard["Dashboard.tsx
Main layout"] end subgraph Services["services/"] APIClient["api.ts
Axios client"] WSClient["websocket.ts
WS client"] end subgraph Stores["store/"] AuthStore["authStore.ts
Auth state"] JobStore["jobStore.ts
Job state"] end subgraph Hooks["hooks/"] UseJobs["useJobs.ts"] UseWS["useWebSocket.ts"] end Types["types/api.ts
TypeScript defs"] end Main --> App App --> Components Components --> Services Components --> Stores Components --> Hooks Services --> Types Stores --> Types style Frontend fill:#fff3e0 style Components fill:#e1f5fe style Services fill:#f3e5f5 style Stores fill:#e8f5e9 ``` ### State Management Architecture #### Authentication Store (`store/authStore.ts`) **Zustand Store with Persistence:** ```typescript interface AuthState { isAuthenticated: boolean user: User | null authConfig: AuthConfig | null isLoading: boolean error: string | null // Actions login(accessToken: string): Promise logout(): Promise checkAuth(): Promise } ``` **Key Features:** - Persists to localStorage (excludes sensitive data) - Automatic token validation on mount - Handles MSAL redirect responses - Manages logout flow with Azure AD **Login Flow:** ```mermaid stateDiagram-v2 [*] --> Unauthenticated Unauthenticated --> InitAuth: Load app InitAuth --> GetConfig: GET /api/auth/config GetConfig --> LoginScreen: Show login page LoginScreen --> MSALRedirect: User clicks login MSALRedirect --> AzureLogin: Redirect to Microsoft AzureLogin --> Authenticating: User enters credentials Authenticating --> MFACheck: Credentials valid MFACheck --> TokenExchange: MFA complete (if required) TokenExchange --> RedirectBack: Auth code received RedirectBack --> HandleRedirect: MSAL.handleRedirectPromise() HandleRedirect --> ValidateToken: POST /api/auth/validate ValidateToken --> Authenticated: Token valid ValidateToken --> LoginScreen: Token invalid Authenticated --> ConnectWS: Connect WebSocket ConnectWS --> Dashboard: Show main UI Dashboard --> [*] note right of TokenExchange PKCE flow No client secret end note note right of ValidateToken Backend checks: - exp claim - aud claim - Extracts user ID end note ``` #### Job Store (`store/jobStore.ts`) **Zustand Store for Job Queue:** ```typescript interface JobState { jobs: Record // Job registry by ID connectionStatus: 'connecting' | 'connected' | 'disconnected' | 'error' selectedModels: ModelConfiguration | null availableModels: ModelInfo[] // Job Management addJob(job: Job): void updateJob(id: string, updates: Partial): void updateProvider(jobId, modelKey, update): void addLog(jobId, logEntry): void removeJob(id: string): void // WebSocket Connection connectWebSocket(): void disconnectWebSocket(): void setConnectionStatus(status): void // Model Configuration loadAvailableModels(): Promise loadDefaultConfig(): Promise // Selectors getActiveJobs(): Job[] getCompletedJobs(): Job[] getFailedJobs(): Job[] getJobsByStatus(status): Job[] } ``` **WebSocket Integration:** - Sets up event handlers for all WebSocket message types - Updates job state in real-time as messages arrive - Manages connection status for UI indicators - Automatically reconnects on disconnect ### Component Architecture #### Dashboard Component (`components/Dashboard.tsx`) **Layout Structure:** ```mermaid graph TB subgraph Dashboard["Dashboard Layout"] subgraph Header["Header Bar"] Logo["Logo + Title"] ConnStatus["Connection Status
Indicator"] Stats["Quick Stats
(Active/Complete/Failed)"] UserMenu["User Info + Logout"] end subgraph Main["Main Content Area"] subgraph Upload["Upload Panel"] FileSelect["Multi-file Selection
(Drag & Drop)"] ModelConfig["Model Configuration
(Primary + Consolidation)"] CostEst["Cost Estimation
(Real-time)"] end subgraph Queue["Queue View"] Active["Active Jobs
(Progress Bars)"] Complete["Completed Jobs
(Download Links)"] Failed["Failed Jobs
(Error Details)"] Batch["Batch Actions
(Multi-download)"] end end subgraph Footer["Footer Bar"] Version["Version Info"] PoweredBy["AI Model Credits"] UserInfo["Current User"] end end Header --> Main Main --> Footer Upload -.-> Queue style Header fill:#f5f5f5 style Upload fill:#e3f2fd style Queue fill:#e8f5e9 style Footer fill:#f5f5f5 ``` **Real-Time Features:** - Connection status indicator with manual reconnect button - Live job count badges (processing, completed, failed) - Auto-refresh on WebSocket disconnect fallback #### Upload Panel (`components/upload/UploadPanel.tsx`) **Features:** - Multi-file drag-and-drop with validation - Model selector with primary + consolidation configuration - Real-time cost estimation before upload - Progress indication during upload - File size and type validation **Upload Workflow:** ```mermaid stateDiagram-v2 [*] --> FileSelection: User drops/selects files FileSelection --> Validation: Files selected Validation --> ModelConfig: Validation passed Validation --> Error: Validation failed Error --> FileSelection: Fix issues ModelConfig --> CostEstimate: Models configured CostEstimate --> ConfirmUpload: Review cost ConfirmUpload --> Uploading: User confirms Uploading --> CreateJobs: POST /api/jobs CreateJobs --> JobCreated: Backend creates jobs JobCreated --> QueueUpdate: WebSocket job.created QueueUpdate --> [*]: Jobs in queue note right of Validation Size: max 200MB Extensions: .pdf, .pptx, .docx, .xlsx end note note right of CostEstimate Real-time calculation Based on file size + model selection end note ``` #### Queue View (`components/queue/QueueView.tsx`) **Display Sections:** - **Active Jobs:** Real-time progress bars, phase indicators, provider chips - **Completed Jobs:** Summary stats, download button, expansion details - **Failed Jobs:** Error messages, retry capability (future feature) **Job Card Features:** - Expandable accordion for detailed view - Provider-specific status chips (color-coded) - Real-time log streaming in expanded view - Progress percentage with phase labels - Token usage and cost display #### WebSocket Client (`services/websocket.ts`) **Features:** - Automatic reconnection with exponential backoff - Connection health monitoring via ping/pong - Token-based authentication via query parameter - Event-driven message handling - Window focus/visibility detection for smart reconnection **Reconnection Strategy:** ```mermaid stateDiagram-v2 [*] --> Disconnected Disconnected --> Attempt1: Wait 5s Attempt1 --> Connected: Success Attempt1 --> Attempt2: Failed Attempt2 --> Connected: Success Attempt2 --> Attempt3: Failed (wait 10s) Attempt3 --> Connected: Success Attempt3 --> GaveUp: Failed (wait 20s) GaveUp --> GaveUp: Stop retrying Connected --> Disconnected: Connection lost GaveUp --> Attempt1: Window focus note right of Attempt1 Initial delay: 5s Max attempts: 3 end note note right of Attempt2 Exponential backoff 10s delay end note note right of Attempt3 Final attempt 20s delay end note note right of GaveUp Manual reconnect only Health check every 30s end note ``` **Authentication:** ``` // WebSocket URL with token wss://domain.com/ws?token= // Backend extracts token from query param token = websocket.args.get('token') user_info = await msal_auth.validate_token(token) client = register_client(user_info['oid']) ``` --- ## Data Flow and Processing Pipeline ### Complete Processing Flow ```mermaid flowchart TD Start([User Uploads File]) --> Upload[1. FILE UPLOAD
POST /api/jobs multipart/form-data
Files + modelConfig] Upload --> JobCreate[2. JOB CREATION Backend
- Validate files
- Create Job UUID
- Save to disk
- Add to registry
- Enqueue job_id
- Broadcast: job.created] JobCreate --> WorkerPull[3. WORKER POOL
Pull from queue
Acquire semaphore] WorkerPull --> Extract[4. STAGE 1: CONTENT EXTRACTION
Phase: EXTRACT_CONTENT
- LlamaParser API call
- OCR + table detection
- Returns markdown
Progress: 25%] Extract --> ParallelAnalysis[5. STAGE 2: PARALLEL ANALYSIS
Phase: LLM_ANALYSIS
asyncio.gather simultaneous] ParallelAnalysis --> GPT5[OpenAI GPT-5
Reasoning: medium
~110 seconds] ParallelAnalysis --> Claude[Anthropic Sonnet 4
Two-call approach
~78 seconds] ParallelAnalysis --> Gemini[Google Gemini 2.5
Thinking enabled
~49 seconds] GPT5 --> Gather[Collect Results
Total time = slowest model 110s
Progress: 75%] Claude --> Gather Gemini --> Gather Gather --> Consolidate[6. STAGE 3: CONSOLIDATION
Phase: CONSOLIDATION
- Format model results
- Load strategy template
- Execute consolidation model
- Smart deduplication
- Validate 'assets' key
Progress: 80%] Consolidate --> Expand[7. STAGE 4: MULTIPLIER EXPANSION
- Extract multiplier arrays
- Cartesian product
- 3 sizes × 5 markets = 15 assets
- Validate quantity] Expand --> CSVGen[8. STAGE 5: CSV GENERATION
Phase: CSV_GENERATION
- Convert to CSV rows
- Async file write
- Create JobSummary
- Mark COMPLETED
Progress: 100%] CSVGen --> WSUpdate[9. WEBSOCKET UPDATE
Broadcast: job.completed
- resultCsvUrl
- summary data] WSUpdate --> UIUpdate[10. FRONTEND UPDATE
- JobStore updates
- UI re-renders
- Download button active
- Summary displayed] UIUpdate --> End([Processing Complete]) style ParallelAnalysis fill:#e1f5ff style GPT5 fill:#10a37f style Claude fill:#d4a373 style Gemini fill:#4285f4 style Gather fill:#e1f5ff ``` ### Parallel Processing Optimization ```mermaid gantt title Processing Time Comparison dateFormat X axisFormat %S section Sequential GPT-5 (110s) :0, 110 Claude (78s) :110, 188 Gemini (49s) :188, 237 Total 237s :milestone, 237, 237 section Parallel GPT-5 (110s) :0, 110 Claude (78s) :0, 78 Gemini (49s) :0, 49 Total 110s :milestone, 110, 110 ``` **Performance Gain:** - Sequential: 237 seconds (sum of all models) - Parallel: 110 seconds (max of all models) - **Speedup: 2.15x faster** **Implementation:** ```python # Create tasks for all models tasks = [ asyncio.create_task(openai_provider.generate_response(...)), asyncio.create_task(anthropic_provider.generate_response(...)), asyncio.create_task(google_provider.generate_response(...)) ] # Execute all concurrently results = await asyncio.gather(*tasks, return_exceptions=True) # Process results (all models complete at ~same time) ``` --- ## Authentication and Security ### Microsoft Azure AD Integration **Authentication Flow:** PKCE (Proof Key for Code Exchange) - Public Client Flow **Why PKCE:** - No client secret required (secure for SPAs) - More secure than implicit flow - Recommended by Microsoft for browser-based apps - Frontend-initiated, backend validates **Complete Authentication Flow:** ``` 1. Frontend Initialization: GET /api/auth/config └─ Returns: { clientId, authority, redirectUri, devMode } 2. User Clicks "Sign in with Microsoft": MSAL.loginRedirect({ scopes: ['openid', 'profile', 'User.Read'], redirectUri: 'https://domain.com/brief-extractor/' }) 3. Redirect to Microsoft Login: User authenticates with work/school account MFA if configured in Azure AD 4. Microsoft Redirects Back: https://domain.com/brief-extractor/?code=...&state=... 5. MSAL Exchanges Code for Token: - Frontend MSAL library handles token exchange - Receives access token (JWT) - Token valid for ~1 hour 6. Frontend Validates Token: POST /api/auth/validate Body: { "accessToken": "" } └─ Backend validates: - JWT signature (future: using Azure JWKS) - Expiration (exp claim) - Audience (aud claim) - Returns user info if valid 7. Store Token: localStorage.setItem('accessToken', token) 8. All Subsequent Requests: Authorization: Bearer ├─ API requests: Axios interceptor adds header └─ WebSocket: Query parameter ?token= ``` ### Security Mechanisms **Transport Security:** - HTTPS enforced in production (`HTTPS_ONLY=true`) - WSS (WebSocket Secure) for real-time communication - Secure cookies in production (`SECURE_COOKIES=true`) **Authentication Security:** - JWT token validation on every request - Token expiration enforcement (Azure AD TTL: ~1 hour) - No client secrets in frontend code (PKCE flow) - Automatic logout on 401 responses **Authorization:** - User ID extraction from validated JWT (`oid` claim) - All jobs tagged with `user_id` - API endpoints filter by `user_id` - Users cannot access other users' jobs/files **Input Validation:** - File extension whitelist - File size limits (200MB default) - Filename sanitization (remove special chars) - Request payload size limits - CORS restrictions to allowed origins **Data Isolation:** - Per-user job filtering in all endpoints - WebSocket broadcasts filtered by `user_id` - File storage uses job-specific UUIDs - No shared state between users **Development Mode Security:** ```python # DEV_MODE bypasses authentication - NEVER use in production! if DEV_MODE: # Creates mock user without validation return {'oid': 'dev-user-id', 'name': 'Development User'} else: # Full JWT validation required validate_token(access_token) ``` ### Azure AD Configuration Requirements **App Registration Settings:** - **Platform:** Single-page application - **Redirect URI:** `https://ai-sandbox.oliver.solutions/brief-extractor/` - **Supported Account Types:** Single tenant or multi-tenant - **API Permissions:** Microsoft Graph → User.Read (delegated) - **Token Configuration:** ID tokens enabled for implicit flow **Required Environment Variables (Backend):** ```bash MSAL_CLIENT_ID= MSAL_TENANT_ID= MSAL_AUTHORITY=https://login.microsoftonline.com/ MSAL_REDIRECT_URI=https://ai-sandbox.oliver.solutions/brief-extractor/ DEV_MODE=false # CRITICAL: Must be false in production ``` --- ## WebSocket Real-Time Communication ### Architecture **Connection Model:** Persistent bidirectional communication - Uses native WebSocket API (browser) and Quart websocket (server) - One connection per user session (can have multiple tabs = multiple connections) - Automatic reconnection on network failures ### Message Protocol **Message Structure:** ```json { "type": "message_type", "timestamp": "2025-10-07T17:45:08.015Z", "jobId": "uuid", "...": "message-specific fields" } ``` **Message Types (Server → Client):** **Connection Management:** ```json { "type": "connection.established", "clientId": "uuid", "userId": "user-oid", "connectedAt": "2025-10-07T17:40:00.000Z" } ``` **Queue Snapshot (sent on connect):** ```json { "type": "queue.snapshot", "jobs": [Job, Job, ...] // All user's jobs } ``` **Job Lifecycle:** ```json // Job created { "type": "job.created", "job": {Job object} } // Job accepted into queue { "type": "job.accepted", "jobId": "uuid" } // Progress update { "type": "job.progress", "jobId": "uuid", "phase": "LLM_ANALYSIS", "progressPct": 45, "stepLabel": "Analyzing with Claude Sonnet 4", "providerUpdates": { "openai-gpt5": { "status": "success", "tokensIn": 5000, "tokensOut": 3000, "costUsd": 0.045, "latencyMs": 85000 } } } // Provider-specific update { "type": "job.provider_update", "jobId": "uuid", "modelKey": "anthropic-sonnet4", "update": { "provider": "anthropic", "model": "claude-sonnet-4-20250514", "status": "success", "tokensIn": 6000, "tokensOut": 2500, "costUsd": 0.055, "latencyMs": 78000 } } // Real-time log entry { "type": "job.log", "jobId": "uuid", "logEntry": { "timestamp": "2025-10-07T17:42:46.474Z", "level": "INFO", "message": "Consolidation completed: 9 base deliverables" } } // Job completion { "type": "job.completed", "jobId": "uuid", "resultCsvUrl": "/api/jobs/{uuid}/download", "summary": { "docType": "presentation", "assetsExtracted": 19, "confidenceScore": 0.95, "costUsdTotal": 0.2759, "tokensTotal": 41322, "processingTimeSeconds": 291.1 } } // Job failure { "type": "job.failed", "jobId": "uuid", "error": "Consolidation failed: Response missing 'assets' key" } // Job deleted { "type": "job.deleted", "jobId": "uuid" } ``` **Heartbeat:** ```json // Server → Client (every 30s) { "type": "ping", "timestamp": "2025-10-07T17:45:00.000Z" } // Client → Server { "type": "pong" } ``` ### Frontend WebSocket Implementation **Connection Management (`services/websocket.ts`):** ```typescript class WebSocketClient { private ws: WebSocket | null private reconnectInterval: 5000ms (initial) private maxReconnectInterval: 60000ms private maxReconnectAttempts: 3 connect() { const wsUrl = `${VITE_WS_URL}/ws?token=${accessToken}` this.ws = new WebSocket(wsUrl) this.ws.onopen = () => { // Reset reconnection counters // Start ping interval (30s) // Notify connection handlers } this.ws.onmessage = (event) => { const message = JSON.parse(event.data) this.handleMessage(message) // Route to event handlers } this.ws.onclose = () => { // Stop ping interval // Schedule reconnection (if not intentional) } } scheduleReconnect() { if (reconnectAttempts >= maxReconnectAttempts) { // Stop trying after 3 attempts return } setTimeout(() => { this.connect() this.reconnectInterval *= 2 // Exponential backoff }, this.reconnectInterval) } } ``` **State Updates via Zustand:** ```typescript // Event handlers registered in jobStore wsClient.on('job.progress', (message) => { updateJob(message.jobId, { phase: message.phase, progressPct: message.progressPct, stepLabel: message.stepLabel, providerUpdates: message.providerUpdates }) // React components re-render automatically }) ``` ### Connection Resilience **Auto-Reconnection Scenarios:** - Network interruption - Server restart - Apache/Nginx reload - Temporary backend unavailability **Smart Reconnection:** - Window focus event: Reset attempts, immediate reconnect - Page visibility change: Reduce penalty, try reconnect - Health check: Ping `/health` every 30s when disconnected - Connection restoration: Resume from last state **Fallback Without WebSocket:** - App remains fully functional - Users can still upload, view, download - Progress updates require manual page refresh - No real-time log streaming (logs available on demand) --- ## API Reference ### Authentication API #### `GET /api/auth/config` Get MSAL configuration for frontend initialization. **Request:** None (no auth required) **Response:** ```json { "config": { "clientId": "9079054c-9620-4757-a256-23413042f1ef", "authority": "https://login.microsoftonline.com/e519c2e6-bc6d-4fdf-8d9c-923c2f002385", "redirectUri": "https://ai-sandbox.oliver.solutions/brief-extractor/", "devMode": false }, "devMode": false } ``` #### `POST /api/auth/validate` Validate an access token and return user information. **Request:** ```json { "accessToken": "eyJ0eXAiOiJKV1QiLCJhbGc..." } ``` **Response (Success):** ```json { "valid": true, "user": { "id": "38abcbd2-7558-4f64-aec2-fafc7807552c", "username": "user@domain.com", "name": "User Name", "roles": ["user"] } } ``` **Response (Invalid):** ```json { "valid": false, "error": "invalid_token", "message": "Token is invalid or expired" } ``` #### `GET /api/auth/user` Get current authenticated user information. **Headers:** `Authorization: Bearer ` **Response:** ```json { "user": { "id": "38abcbd2-7558-4f64-aec2-fafc7807552c", "username": "user@domain.com", "name": "User Name", "roles": ["user"] } } ``` #### `POST /api/auth/logout` Get logout URL for proper Microsoft session termination. **Request:** ```json { "redirectUri": "https://ai-sandbox.oliver.solutions/brief-extractor/" } ``` **Response:** ```json { "logoutUrl": "https://login.microsoftonline.com/{tenant}/oauth2/v2.0/logout?post_logout_redirect_uri=..." } ``` ### Jobs API #### `POST /api/jobs` Create new processing jobs from uploaded files. **Headers:** - `Authorization: Bearer ` - `Content-Type: multipart/form-data` **Request Body:** - `file_0`, `file_1`, ... : File uploads - `modelConfig` (optional): JSON string with model configuration **Model Config Structure:** ```json { "primaryModels": ["openai-gpt5", "anthropic-sonnet4", "google-gemini25"], "consolidationModel": "openai-gpt5", "minimumSuccessThreshold": 1 } ``` **Response:** ```json { "jobs": [ { "id": "4614818d-38c6-4eac-aa39-659c89d90836", "fileName": "brief.pdf", "fileSize": 1048576, "createdAt": "2025-10-07T17:40:00.000Z", "updatedAt": "2025-10-07T17:40:00.000Z", "userId": "38abcbd2-7558-4f64-aec2-fafc7807552c", "phase": "QUEUED", "progressPct": 0, "stepLabel": "Queued for processing", "providerUpdates": {}, "error": null, "resultCsvUrl": null, "summary": null, "logs": [], "modelConfig": { "primaryModels": ["openai-gpt5", "anthropic-sonnet4", "google-gemini25"], "consolidationModel": "openai-gpt5", "minimumSuccessThreshold": 1 } } ], "errors": [] } ``` **Error Responses:** ```json // No files { "error": "no_files", "message": "No files provided for upload" } // Invalid model config { "error": "invalid_model_config", "message": "Invalid model configuration: ..." } // File too large { "error": "file_too_large", "message": "File size exceeds 200MB limit" } ``` #### `GET /api/jobs` List jobs for the current authenticated user. **Headers:** `Authorization: Bearer ` **Query Parameters:** - `limit` (optional): Max results (default: 50, max: 100) - `offset` (optional): Skip count for pagination (default: 0) - `status` (optional): Filter by phase (e.g., "COMPLETED") **Response:** ```json { "jobs": [Job, Job, ...], "pagination": { "limit": 50, "offset": 0, "count": 15 } } ``` #### `GET /api/jobs/{job_id}` Get detailed information for a specific job. **Headers:** `Authorization: Bearer ` **Response:** Single `Job` object (same structure as POST /api/jobs) **Error:** ```json { "error": "not_found", "message": "Job not found or access denied" } ``` #### `DELETE /api/jobs/{job_id}` Delete a job and all associated files. **Headers:** `Authorization: Bearer ` **Response:** ```json { "message": "Job deleted successfully" } ``` #### `GET /api/jobs/{job_id}/download` Download the CSV result file for a completed job. **Headers:** `Authorization: Bearer ` **Response:** - **Content-Type:** `text/csv; charset=utf-8` - **Content-Disposition:** `attachment; filename="brief-20251007174508.csv"` - **Body:** CSV file content **Error (Job Not Complete):** ```json { "error": "not_ready", "message": "Job has not completed processing yet" } ``` #### `POST /api/jobs/batch-download` Download multiple CSV files as a ZIP archive. **Headers:** `Authorization: Bearer ` **Request:** ```json { "jobIds": ["uuid1", "uuid2", "uuid3"] } ``` **Response:** - **Content-Type:** `application/zip` - **Content-Disposition:** `attachment; filename="brief-extractor-results-{timestamp}.zip"` - **Body:** ZIP file containing CSV files #### `GET /api/jobs/{job_id}/logs` Get processing logs for a specific job. **Headers:** `Authorization: Bearer ` **Query Parameters:** - `limit` (optional): Max log entries (default: 100) - `level` (optional): Filter by level (DEBUG, INFO, WARNING, ERROR) **Response:** ```json { "logs": [ { "timestamp": "2025-10-07T17:42:46.474Z", "level": "INFO", "message": "Starting consolidation with 2 model results using openai-gpt5" } ], "count": 150 } ``` #### `GET /api/jobs/stats` Get job processing statistics for the current user. **Headers:** `Authorization: Bearer ` **Response:** ```json { "stats": { "total": 25, "completed": 20, "failed": 2, "processing": 3, "queued": 0, "totalAssetsExtracted": 487, "totalCostUsd": 5.67, "totalProcessingTime": 3600.5, "averageAssetsPerJob": 24.35, "averageCostPerJob": 0.283 } } ``` ### Configuration API #### `GET /api/config/models` List all available AI models with pricing and capabilities. **Response:** ```json { "models": [ { "key": "openai-gpt5", "name": "GPT-5", "provider": "OpenAI", "description": "Latest OpenAI model with advanced reasoning capabilities", "costPer1mInput": 2.50, "costPer1mOutput": 10.00, "canBePrimary": true, "canBeConsolidation": true }, { "key": "anthropic-sonnet4", "name": "Claude Sonnet 4", "provider": "Anthropic", "description": "Balanced performance and cost", "costPer1mInput": 3.00, "costPer1mOutput": 15.00, "canBePrimary": true, "canBeConsolidation": true } ] } ``` #### `GET /api/config/defaults` Get default model configuration. **Response:** ```json { "config": { "primaryModels": ["openai-gpt5", "anthropic-sonnet4", "google-gemini25"], "consolidationModel": "openai-gpt5", "minimumSuccessThreshold": 1 } } ``` #### `POST /api/config/estimate` Estimate processing cost before uploading. **Request:** ```json { "modelConfig": { "primaryModels": ["openai-gpt5", "anthropic-sonnet4"], "consolidationModel": "anthropic-opus4" }, "fileSizeBytes": 1048576, "estimatedTokens": 10000 } ``` **Response:** ```json { "estimatedCostUsd": 0.45, "breakdown": { "openai-gpt5": 0.15, "anthropic-sonnet4": 0.12, "anthropic-opus4": 0.18 }, "estimatedTokens": { "input": 8000, "output": 6000, "total": 14000 }, "estimatedTime": "90-180 seconds" } ``` #### `POST /api/config/validate` Validate model configuration before submission. **Request:** ```json { "modelConfig": { "primaryModels": ["invalid-model"], "consolidationModel": "openai-gpt5" } } ``` **Response:** ```json { "valid": false, "errors": [ "Primary model 'invalid-model' is not available" ], "warnings": [ "Using only 1 primary model - consider using 2-3 for better accuracy" ], "modelCount": { "primary": 1, "consolidation": 1, "total": 2 } } ``` --- ## Data Models and Schemas ### Job Data Model ```python @dataclass class Job: id: str # UUID file_name: str # Original filename file_size: int # Bytes created_at: datetime # UTC timestamp updated_at: datetime # UTC timestamp user_id: str # Azure AD user OID upload_path: str # Disk path output_path: Optional[str] # CSV path (when complete) phase: JobPhase # Current processing phase progress_pct: int # 0-100 step_label: str # Human-readable step provider_updates: Dict[str, ProviderUpdate] # Per-model status error: Optional[str] # Error message if failed result_csv_url: Optional[str] # Download endpoint summary: Optional[JobSummary] # Completion summary logs: List[LogEntry] # Processing logs model_config: ModelConfiguration # AI model settings ``` ### Job Phases ```mermaid stateDiagram-v2 [*] --> QUEUED: Job created QUEUED --> EXTRACT_CONTENT: Worker picks up EXTRACT_CONTENT --> LLM_ANALYSIS: Content extracted LLM_ANALYSIS --> CONSOLIDATION: Analysis complete CONSOLIDATION --> CSV_GENERATION: Results consolidated CSV_GENERATION --> COMPLETED: CSV written EXTRACT_CONTENT --> FAILED: Extraction error LLM_ANALYSIS --> FAILED: All models failed CONSOLIDATION --> FAILED: Consolidation error CSV_GENERATION --> FAILED: Write error COMPLETED --> [*] FAILED --> [*] note right of QUEUED Progress: 0% Waiting for worker end note note right of EXTRACT_CONTENT Progress: 10-25% LlamaParser API end note note right of LLM_ANALYSIS Progress: 25-75% Parallel model execution end note note right of CONSOLIDATION Progress: 75-90% Single model merging end note note right of CSV_GENERATION Progress: 90-100% File write end note ``` ### Provider Update Model ```python @dataclass class ProviderUpdate: provider: str # 'openai', 'anthropic', 'google' model: str # 'gpt-5', 'claude-sonnet-4', etc. status: str # 'started', 'success', 'error' started_at: Optional[str] # ISO timestamp completed_at: Optional[str] # ISO timestamp latency_ms: Optional[float] # Processing duration tokens_in: Optional[int] # Input tokens tokens_out: Optional[int] # Output tokens tokens_cached: Optional[int] # Cached tokens (cost reduction) cost_usd: Optional[float] # Estimated cost error: Optional[str] # Error message if failed ``` ### Base Deliverable Schema **Purpose:** Intermediate format with multiplier arrays (before expansion) ```python class BaseDeliverable(BaseModel): # Metadata (String Fields) title: str # Required status: Optional[str] = "" # "Draft", "In Progress", "Final" category: Optional[str] = "" # "Paid Social - Meta Feed" media: Optional[str] = "" # "IMAGE", "VIDEO", "COPY" asset_type: Optional[str] = "" # "JPG", "PNG", "MP4" brand_identifier: Optional[str] = "" # "adidas TERREX" # Multiplier Arrays (Expansion Fields) technical_specifications: Optional[List[str]] = [] # ["1080x1080", "1920x1080"] language_country_market: Optional[List[str]] = [] # ["EN-UK", "DE-DE", "IT-IT"] # Dates and References (String Fields) review_date: Optional[str] = "" # "2024-08-08" live_date: Optional[str] = "" # "08/08" end_date: Optional[str] = "" # "2025-12-31" reference_material: Optional[str] = "" # URLs or notes # Metadata (String Fields) quantity: Optional[str] = "1" # For validation page_number: Optional[str] = "" # "3-4" priority_level: Optional[str] = "" # "High" creative_direction: Optional[str] = "" # Design requirements ``` **Multiplier Expansion Example:** ```mermaid graph LR Base["Base Deliverable

Title: Hero Slider
Specs: [750x1200, 1920x853]
Markets: [IT-IT]
Quantity: 2"] subgraph Expansion["Cartesian Product Expansion"] Combo1["750x1200 × IT-IT"] Combo2["1920x853 × IT-IT"] end Asset1["Asset 1
Hero Slider (750x1200, IT-IT)
Quantity: 1"] Asset2["Asset 2
Hero Slider (1920x853, IT-IT)
Quantity: 1"] Base --> Expansion Combo1 --> Asset1 Combo2 --> Asset2 style Base fill:#fff3e0 style Expansion fill:#e3f2fd style Asset1 fill:#e8f5e9 style Asset2 fill:#e8f5e9 ``` ### Marketing Asset Schema **Purpose:** Final individual assets for CSV export (after expansion) ```python class MarketingAsset(BaseModel): # All fields become strings (arrays expanded into individual assets) title: str # "Hero Slider (750x1200, IT-IT)" status: Optional[str] = "" category: Optional[str] = "" media: Optional[str] = "" asset_type: Optional[str] = "" brand_identifier: Optional[str] = "" technical_specifications: Optional[str] = "" # Single value: "750x1200" review_date: Optional[str] = "" live_date: Optional[str] = "" end_date: Optional[str] = "" reference_material: Optional[str] = "" language_country_market: Optional[str] = "" # Single value: "IT-IT" quantity: Optional[str] = "1" # Always "1" for individuals page_number: Optional[str] = "" priority_level: Optional[str] = "" creative_direction: Optional[str] = "" ``` ### CSV Output Format (16 Columns) ```csv title,category,media,asset_type,technical_specifications,language_country_market,quantity,brand_identifier,review_date,live_date,end_date,reference_material,page_number,priority_level,creative_direction,status "Hero Slider (750x1200, IT-IT)","Wholesale - Hero Slider","IMAGE","JPG","750x1200","IT-IT","1","adidas TERREX","2024-08-08","08/08","","https://drive.google.com/...","3-4","High","Adapt as per layouts...","Draft" ``` --- ## Configuration Management ### Environment Variables **Backend Configuration (`.env` in project root):** ```bash # ============================================================================= # API KEYS (Required) # ============================================================================= OPENAI_API_KEY=sk-... # OpenAI GPT-5 access ANTHROPIC_API_KEY=sk-ant-api03-... # Anthropic Claude access GOOGLE_API_KEY=AIzaSy... # Google Gemini access LLAMACLOUD_API_KEY=llx-... # LlamaParser cloud service # ============================================================================= # OPENAI CONFIGURATION # ============================================================================= OPENAI_MODEL=gpt-5 OPENAI_REASONING_EFFORT=medium # high, medium, low, minimal OPENAI_TIMEOUT=3600 # 1 hour (for long documents) OPENAI_MAX_RETRIES=2 # ============================================================================= # ANTHROPIC CONFIGURATION # ============================================================================= ANTHROPIC_MODEL_OPUS=claude-opus-4-1-20250805 ANTHROPIC_MODEL_SONNET=claude-sonnet-4-20250514 ANTHROPIC_TEMPERATURE=1 # Higher for creative analysis ANTHROPIC_MAX_TOKENS=32000 # Sonnet limit (Opus: 64000) ANTHROPIC_THINKING_BUDGET=12000 # Thinking tokens ANTHROPIC_TIMEOUT=300 # 5 minutes # ============================================================================= # GOOGLE CONFIGURATION # ============================================================================= GOOGLE_MODEL=gemini-2.5-pro GOOGLE_TEMPERATURE=0.7 GOOGLE_MAX_OUTPUT_TOKENS=100000 GOOGLE_THINKING_BUDGET=12000 GOOGLE_TIMEOUT=3600 # ============================================================================= # PROCESSING CONFIGURATION # ============================================================================= DEFAULT_PRIMARY_MODELS=openai-gpt5,anthropic-sonnet4,google-gemini25 DEFAULT_CONSOLIDATION_MODEL=openai-gpt5 MINIMUM_SUCCESS_THRESHOLD=1 # Min models that must succeed ENABLE_COST_ESTIMATION=true MAX_PROCESSING_COST_USD=10.00 # ============================================================================= # MSAL AUTHENTICATION (Azure AD) # ============================================================================= MSAL_CLIENT_ID=9079054c-9620-4757-a256-23413042f1ef MSAL_CLIENT_SECRET=placeholder # Not used for PKCE flow MSAL_TENANT_ID=e519c2e6-bc6d-4fdf-8d9c-923c2f002385 MSAL_REDIRECT_URI=https://ai-sandbox.oliver.solutions/brief-extractor/ MSAL_AUTHORITY=https://login.microsoftonline.com/e519c2e6-bc6d-4fdf-8d9c-923c2f002385 # ============================================================================= # SECURITY AND RUNTIME # ============================================================================= DEV_MODE=false # MUST be false in production! ALLOWED_ORIGINS=https://ai-sandbox.oliver.solutions SESSION_SECRET= SECURE_COOKIES=true # true for HTTPS HTTPS_ONLY=true # true for production # ============================================================================= # JOB PROCESSING # ============================================================================= MAX_CONCURRENT_JOBS=5 # Parallel job processing limit MAX_UPLOAD_SIZE_MB=200 # Per-file upload limit FILE_RETENTION_HOURS=24 # Auto-cleanup threshold WS_PING_INTERVAL_SECONDS=30 # WebSocket heartbeat # ============================================================================= # SERVER CONFIGURATION # ============================================================================= SERVER_HOST=0.0.0.0 SERVER_PORT=8002 SERVER_WORKERS=2 # Hypercorn workers (has no effect with serve) ``` **Frontend Configuration (`frontend/.env`):** ```bash # Backend API and WebSocket URLs (embedded at build time) # Production VITE_API_URL=https://ai-sandbox.oliver.solutions/brief-extractor-back/api VITE_WS_URL=wss://ai-sandbox.oliver.solutions/brief-extractor-back # Local Development (comment out production, uncomment below) # VITE_API_URL=http://localhost:8000/api # VITE_WS_URL=ws://localhost:8000 ``` **Build Configuration (`frontend/vite.config.ts`):** ```typescript export default defineConfig({ base: '/brief-extractor/', // Deployment path prefix plugins: [react()], resolve: { alias: { '@': path.resolve(__dirname, './src') // Import alias } }, server: { port: 3000, proxy: { // Dev server proxying '/api': { target: 'http://localhost:8000', changeOrigin: true }, '/ws': { target: 'ws://localhost:8000', ws: true } } } }) ``` ### Configuration Loading Priority **Backend:** 1. Environment variables from `.env` file 2. Default values in `core/config.py` and `server/config_runtime.py` 3. Runtime overrides (future feature) **Frontend:** 1. Build-time environment variables (`VITE_*`) 2. Fallback defaults in code (e.g., `/api` for VITE_API_URL) --- ## Deployment Architecture ### Production Deployment Topology ```mermaid graph TB Internet["Internet
(HTTPS/WSS)"] subgraph Server["Production Server: ai-sandbox.oliver.solutions"] Apache["Apache Web Server
Port 443
- SSL/TLS Termination
- Virtual Host
- ProxyPass WebSocket
- Serve static files"] subgraph Static["Static Files"] Frontend["/brief-extractor/
/var/www/html/brief-extractor/dist/"] end subgraph Backend["Quart Application"] Hypercorn["Hypercorn ASGI Server
Port 8002
Systemd service"] Workers["5 Async Job Processors"] WSSupport["WebSocket Support"] Storage["File Storage
/server/data/"] end end subgraph External["External APIs"] OpenAI["OpenAI API"] Anthropic["Anthropic API"] Google["Google AI API"] Llama["LlamaCloud API"] AzureAD["Azure AD
Authentication"] end Internet -->|"HTTPS"| Apache Apache -->|"Proxy
/brief-extractor-back/"| Hypercorn Apache -.->|"Serve"| Frontend Hypercorn --> Workers Hypercorn --> WSSupport Workers --> Storage Workers --> OpenAI Workers --> Anthropic Workers --> Google Workers --> Llama Frontend --> AzureAD style Apache fill:#f9f9f9 style Hypercorn fill:#e3f2fd style Workers fill:#e8f5e9 style Frontend fill:#fff3e0 ``` ### Apache Configuration ```apache # Brief Extractor - WebSocket and HTTP proxy ProxyPass /brief-extractor-back/ws ws://localhost:8002/ws ProxyPass /brief-extractor-back/ http://localhost:8002/ ProxyPassReverse /brief-extractor-back/ http://localhost:8002/ # Static frontend files Alias /brief-extractor /var/www/html/brief-extractor/dist Options -Indexes +FollowSymLinks AllowOverride None Require all granted # SPA routing support RewriteEngine On RewriteBase /brief-extractor/ RewriteRule ^index\.html$ - [L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /brief-extractor/index.html [L] # Required Apache modules # sudo a2enmod proxy proxy_http proxy_wstunnel rewrite ``` ### Systemd Service Configuration **Service File:** `/etc/systemd/system/brief-extractor.service` ```ini [Unit] Description=Brief Extractor Backend Service After=network.target [Service] Type=simple User=www-data WorkingDirectory=/var/www/html/brief-extractor/backend Environment="PATH=/var/www/html/brief-extractor/backend/venv/bin:/usr/bin" ExecStart=/var/www/html/brief-extractor/backend/venv/bin/python -m server.app Restart=always RestartSec=10 StandardOutput=journal StandardError=journal [Install] WantedBy=multi-user.target ``` **Service Management:** ```bash # Start service sudo systemctl start brief-extractor # Enable on boot sudo systemctl enable brief-extractor # View logs sudo journalctl -u brief-extractor -f # Restart after code update sudo systemctl restart brief-extractor ``` ### Build and Deployment Process **Backend Deployment:** ```bash # 1. Update code on server cd /var/www/html/brief-extractor/backend git pull origin main # 2. Activate virtual environment source venv/bin/activate # 3. Install/update dependencies pip install -r requirements_enhanced.txt # 4. Update .env file (if needed) nano .env # 5. Restart service sudo systemctl restart brief-extractor # 6. Verify deployment curl https://ai-sandbox.oliver.solutions/brief-extractor-back/health ``` **Frontend Deployment:** ```bash # 1. On development machine, configure production URLs cd frontend nano .env # Ensure VITE_API_URL points to production # 2. Build for production npm run build # 3. Deploy to server scp -r dist/* user@server:/var/www/html/brief-extractor/dist/ # 4. Verify deployment # Visit: https://ai-sandbox.oliver.solutions/brief-extractor/ ``` **Environment-Specific Builds:** ```bash # Build for production server VITE_API_URL=https://ai-sandbox.oliver.solutions/brief-extractor-back/api \ VITE_WS_URL=wss://ai-sandbox.oliver.solutions/brief-extractor-back \ npm run build # Build for local development VITE_API_URL=http://localhost:8000/api \ VITE_WS_URL=ws://localhost:8000 \ npm run build ``` --- ## Error Handling and Logging ### Backend Logging Strategy **Structured Logging (Structlog):** ```python import structlog structlog.configure( processors=[ structlog.stdlib.filter_by_level, structlog.stdlib.add_logger_name, structlog.stdlib.add_log_level, structlog.processors.TimeStamper(fmt="ISO"), structlog.processors.JSONRenderer() # JSON for production ] ) logger = structlog.get_logger(__name__) logger.info("Event occurred", key="value", user_id=user_id) ``` **Output Format:** ```json { "event": "Job processing completed", "logger": "server.runners.job_runner", "level": "info", "timestamp": "2025-10-07T17:45:08.132Z", "job_id": "uuid", "assets_extracted": 19, "cost_usd": 0.2759 } ``` **Log Levels:** - **DEBUG:** Detailed diagnostic information (disabled in production) - **INFO:** General informational messages (job lifecycle, API calls) - **WARNING:** Warning conditions (token validation issues, model failures) - **ERROR:** Error events (job failures, API errors, exceptions) **Key Logging Points:** **Job Lifecycle:** ```python logger.info(f"Created job {job.id} for file {file_name} (user: {user_id})") logger.info(f"Processing job {job_id}: {job.file_name}") logger.info(f"Job {job_id} completed successfully: {assets} assets, ${cost}, {time}s") logger.error(f"Job {job_id} failed: {error}", exc_info=True) ``` **AI Model Calls:** ```python # Standard success logging logger.info(f"[INITIAL] Structured output validated: 9 assets") # Verbose error logging (only when problems occur) logger.error(f"[CONSOLIDATION] ========== MISSING 'assets' KEY ==========") logger.error(f"[CONSOLIDATION] Full raw content: {response.content}") logger.error(f"[CONSOLIDATION] Debug file saved: /tmp/consolidation_debug_*.json") ``` **WebSocket Events:** ```python logger.info(f"Registered WebSocket client {client_id} for user {user_id}") logger.warning(f"WebSocket connection rejected - no valid authentication") logger.debug(f"Broadcast message to {sent_count} clients for user {user_id}") ``` ### Error Handling Patterns **API Error Responses:** ```python # Validation Error (400) return jsonify({ 'error': 'invalid_request', 'message': 'Specific validation error details' }), 400 # Authentication Error (401) return jsonify({ 'error': 'unauthorized', 'message': 'Valid authentication required' }), 401 # Not Found (404) return jsonify({ 'error': 'not_found', 'message': 'Job not found or access denied' }), 404 # Internal Error (500) return jsonify({ 'error': 'server_error', 'message': 'Internal server error' }), 500 ``` **Job Processing Errors:** ```python try: # Processing stages... result = await analyzer.process_document_multi_model(...) except Exception as e: # Capture and report error error_msg = f"Job processing failed: {str(e)}" logger.error(f"Job {job.id} failed: {error_msg}", exc_info=True) # Update job state await progress.emit_failure(error_msg) job.mark_failed(error_msg) # Broadcast via WebSocket await ws_manager.broadcast_job_update(job.id, { 'type': 'job.failed', 'jobId': job.id, 'error': error_msg }) return False # Job failed ``` **Partial Failure Handling:** ```python # LLM Analysis - allow partial success responses, metadata = await provider_manager.execute_parallel_analysis( models=['openai-gpt5', 'anthropic-sonnet4', 'google-gemini25'], minimum_success_threshold=1 # At least 1 must succeed ) # If 2 out of 3 models fail, processing continues with 1 result # Consolidation still occurs, just with less data diversity ``` ### Debug Artifact Generation **When Consolidation Fails:** ```python # Automatic debug file creation debug_file = f"/tmp/consolidation_debug_{timestamp}.json" { "timestamp": "20251007_174500", "consolidation_model": "gpt-5", "raw_content": "{}", # Empty response "parsed_data": {}, "primary_analysis_results": [ { "provider": "anthropic", "model": "claude-sonnet-4", "success": true, "deliverable_count": 9 } ], "token_usage": {...} } ``` **Location:** `/tmp/` directory on server **Purpose:** Post-mortem analysis of API responses **Includes:** Full request context, model outputs, token stats --- ## Performance and Scalability ### Performance Characteristics **Processing Times (Typical 10-page Brief):** - **Content Extraction:** 10-30 seconds (LlamaParser) - **Parallel Analysis:** 50-120 seconds (limited by slowest model) - GPT-5: 90-110 seconds - Claude Sonnet: 60-80 seconds - Gemini: 40-50 seconds - **Consolidation:** 60-90 seconds (single model) - **CSV Generation:** <1 second - **Total:** 2-4 minutes end-to-end **Token Usage (Typical):** - **Input:** 8,000-12,000 tokens (document + prompt) - **Output:** 2,000-6,000 tokens per model - **Total:** 30,000-50,000 tokens across all models **Cost (Typical):** - **3-Model Analysis:** $0.20-$0.40 - **With Premium Consolidation (Opus):** +$0.15-$0.25 - **Average Per Document:** $0.25-$0.45 ### Scalability Analysis **Concurrent Processing:** ``` MAX_CONCURRENT_JOBS=5 → 5 documents processed simultaneously Each job uses 3 primary models + 1 consolidation = 4 LLM calls Maximum concurrent LLM calls: 5 jobs × 3 models = 15 parallel API calls (Consolidation sequential within each job) ``` **Memory Footprint:** ``` Per Job: - Uploaded file: ~1-50 MB (in memory during upload, then on disk) - Extracted content: ~50-500 KB (markdown) - LLM responses: ~10-50 KB each × 4 models = 40-200 KB - Job metadata: ~5-10 KB Total per job: ~1-50 MB (mostly file content) With 5 concurrent jobs: ~5-250 MB total ``` **Network Bandwidth:** ``` Upload: User → Server - Per document: 1-50 MB - Rate limiting: None (could add via nginx) LLM API Calls: Server → AI Providers - Per request: 10-100 KB (prompts) - Per response: 5-50 KB (structured JSON) - Concurrent: 15 simultaneous connections Download: Server → User - Per CSV: 5-500 KB (typically < 100 KB) - Batch ZIP: Up to 10 MB for large batches ``` ### Bottlenecks and Optimizations **Current Bottlenecks:** 1. **LLM API Latency:** 50-120 seconds (external dependency) - Mitigation: Parallel execution reduces total time - Future: Caching for similar documents 2. **File Upload Speed:** Network dependent - Mitigation: Chunked upload (future) - Compression: Could reduce bandwidth 50-70% 3. **Concurrent Job Limit:** MAX_CONCURRENT_JOBS=5 - Rationale: Cost control, API rate limits - Tunable: Can increase to 10-20 with monitoring **Optimization Strategies:** **1. Token Caching (Prompt Caching):** ```python # OpenAI and Anthropic support prompt caching # Repeated analysis of similar documents reuses cached context # Savings: 50-90% on input token costs ``` **2. Result Caching (Future):** ```python # Cache analysis results by document hash # If same file uploaded again, return cached result # Savings: 100% cost reduction for duplicates ``` **3. Async Everything:** ```python # File I/O: run_in_executor() for blocking operations # Database: Currently in-memory (future: async DB driver) # API calls: Native async clients (AsyncOpenAI, AsyncAnthropic) ``` **4. Smart Model Selection:** ```python # Cost-optimized: GPT-5 + Gemini (cheapest) # Quality-optimized: All 3 models # Speed-optimized: Sonnet + Gemini (fastest) ``` ### Monitoring and Observability **Health Check Endpoint:** ```bash curl https://ai-sandbox.oliver.solutions/brief-extractor-back/health { "status": "healthy", "timestamp": "2025-10-07T17:45:00.000Z", "queue": { "pending": 2, "active": 3 }, "websockets": { "total_connections": 5, "unique_users": 3 }, "config": { "devMode": false, "maxConcurrentJobs": 5, "maxUploadSize": "200MB" } } ``` **Metrics to Monitor:** - Queue depth (`queue.pending`) - Active jobs (`queue.active`) - WebSocket connection count - Average processing time per job - Error rate (failed jobs / total jobs) - API response times - Token usage and costs **Logging Integration:** ```bash # Systemd journal sudo journalctl -u brief-extractor -f --since "1 hour ago" # Filter by log level sudo journalctl -u brief-extractor -p err # Filter by job ID sudo journalctl -u brief-extractor | grep "job_id=uuid" ``` --- ## Development Guide ### Local Development Setup **Prerequisites:** - Python 3.13+ with virtual environment support - Node.js 18+ with npm - Git for version control - Azure AD app registration (for auth testing) **Backend Setup:** ```bash # 1. Clone repository git clone cd adi-o3-multipass # 2. Create and activate virtual environment python3 -m venv venv source venv/bin/activate # Mac/Linux # or: venv\Scripts\activate # Windows # 3. Install dependencies pip install -r requirements_enhanced.txt # 4. Configure environment cp .env.example .env nano .env # Add API keys, set DEV_MODE=true # 5. Verify configuration python -c "from core.config import config; print(config.validate_api_keys())" # 6. Run development server python -m server.app # Server starts on http://0.0.0.0:8000 ``` **Frontend Setup:** ```bash # 1. Navigate to frontend cd frontend # 2. Install dependencies npm install # 3. Configure environment cp .env.example .env nano .env # Set local development URLs # Example .env for local dev: # VITE_API_URL=http://localhost:8000/api # VITE_WS_URL=ws://localhost:8000 # 4. Start development server npm run dev # Server starts on http://localhost:3000 # 5. Open browser open http://localhost:3000 ``` ### Development Workflow **Typical Development Session:** ```bash # Terminal 1: Backend cd adi-o3-multipass source venv/bin/activate python -m server.app # Terminal 2: Frontend cd adi-o3-multipass/frontend npm run dev # Terminal 3: Logs sudo journalctl -u brief-extractor -f # or: tail -f server/processing.log ``` **Hot Reload:** - **Frontend:** Vite HMR (instant updates on save) - **Backend:** Manual restart required (no auto-reload in production mode) - Development: Set `DEBUG=true` for auto-reload (not recommended for async code) ### Testing **Backend Unit Tests (Future):** ```python # tests/test_job_manager.py async def test_create_job(): manager = JobManager.get_instance() job = await manager.create_job( file_name="test.pdf", file_size=1024, file_data=b"...", user_id="test-user" ) assert job.phase == JobPhase.QUEUED ``` **Frontend Testing:** ```bash # Type checking npm run type-check # Linting npm run lint # Unit tests (future) npm run test ``` **Integration Testing:** ```bash # Test full pipeline with sample document python core/process_brief_enhanced.py examples/sample_brief.pdf \ --primary-models openai-gpt5,anthropic-sonnet4 \ --consolidation-model anthropic-opus4 ``` ### Debugging Tips **Backend Debugging:** ```python # Add breakpoint import pdb; pdb.set_trace() # Enhanced logging for specific job logger = logging.getLogger(f"job.{job_id}") logger.setLevel(logging.DEBUG) # Inspect job state job = await job_manager.get_job(job_id) print(f"Phase: {job.phase}, Progress: {job.progress_pct}%") print(f"Providers: {job.provider_updates}") ``` **Frontend Debugging:** ```typescript // Access store from console import { useJobStore } from '@/store/jobStore' const jobs = useJobStore.getState().jobs console.log(jobs) // WebSocket debugging import { wsClient } from '@/services/websocket' console.log(wsClient.isConnected()) console.log(wsClient.getConnectionState()) // Force reconnect wsClient.forceReconnect() ``` **Common Debug Scenarios:** **Jobs not appearing in queue:** ```bash # Check WebSocket connection # Frontend console: wsClient.getConnectionState() # Backend logs: grep "WebSocket" /var/log/ # Check user isolation # Backend: Verify user_id matches between job creation and WebSocket # Log: "Created job ... (user: {user_id})" # Log: "WebSocket authenticated ... for user: {user_id}" ``` **Consolidation returning empty:** ```bash # Check debug files ls /tmp/consolidation_debug_*.json cat /tmp/consolidation_debug_20251007_174500.json # Check OpenAI library version pip show openai # Should be >= 1.0.0 for responses.parse() support ``` --- ## Troubleshooting Guide ### Common Issues and Resolutions #### Issue: "Development Mode" banner shows in production **Symptoms:** - Login page shows yellow "Development Mode" banner - Authentication bypassed **Root Cause:** Backend `DEV_MODE=true` in `.env` **Resolution:** ```bash # Edit backend .env file nano .env # Change: DEV_MODE=false # Restart backend sudo systemctl restart brief-extractor # Verify curl https://ai-sandbox.oliver.solutions/brief-extractor-back/api/auth/config # Should return: "devMode": false ``` --- #### Issue: WebSocket connect/disconnect loop **Symptoms:** - Frontend logs show rapid connect/disconnect - Backend logs: `'str' object has no attribute 'value'` **Root Cause:** Job phase serialization issue when phase is string instead of enum **Resolution:** Ensure `server/jobs/models.py` has defensive phase handling: ```python def to_dict(self): phase_value = self.phase.value if isinstance(self.phase, JobPhase) else self.phase return {'phase': phase_value, ...} ``` --- #### Issue: Jobs don't appear in queue after upload **Symptoms:** - Upload succeeds - Must refresh page to see job **Root Cause:** WebSocket user ID mismatch (session ID vs real user ID) **Resolution:** - Ensure WebSocket authenticates with query parameter: `?token=` - Backend extracts user ID from token: `websocket.args.get('token')` - WebSocket client registered with real user ID, not session ID --- #### Issue: GPT-5 consolidation returns empty object **Symptoms:** - Backend logs: `Missing 'assets' key in consolidated response` - Consolidation phase fails **Root Cause:** Outdated OpenAI library **Resolution:** ```bash # Upgrade OpenAI library pip install --upgrade openai # Verify version pip show openai # Should be >= 1.0.0 # Restart backend sudo systemctl restart brief-extractor ``` --- #### Issue: 404 errors for assets (index-*.js, index-*.css) **Symptoms:** - Browser console: `Loading module blocked - disallowed MIME type` - Assets return HTML (404 page) instead of JS/CSS **Root Cause:** Incorrect base path in Vite config **Resolution:** ```typescript // frontend/vite.config.ts export default defineConfig({ base: '/brief-extractor/', // Must match deployment path ... }) // Rebuild npm run build // Verify built index.html cat dist/index.html // Should show: src="/brief-extractor/assets/index-*.js" ``` --- #### Issue: WebSocket 400 Bad Request **Symptoms:** - Backend logs: `GET /ws 1.1" 400` - WebSocket never establishes **Root Cause:** Apache not configured for WebSocket upgrade **Resolution:** ```apache # Add to Apache config ProxyPass /brief-extractor-back/ws ws://localhost:8002/ws ProxyPass /brief-extractor-back/ http://localhost:8002/ ProxyPassReverse /brief-extractor-back/ http://localhost:8002/ # Enable required modules sudo a2enmod proxy proxy_http proxy_wstunnel rewrite # Reload Apache sudo systemctl reload apache2 ``` --- #### Issue: CORS errors in browser **Symptoms:** - Browser console: `CORS policy: No 'Access-Control-Allow-Origin' header` - API calls fail **Root Cause:** ALLOWED_ORIGINS misconfigured **Resolution:** ```bash # Edit backend .env ALLOWED_ORIGINS=https://ai-sandbox.oliver.solutions # NOT: https://ai-sandbox.oliver.solutions/brief-extractor # (Don't include path, just domain) # Restart backend sudo systemctl restart brief-extractor ``` --- #### Issue: File upload fails with 413 Payload Too Large **Symptoms:** - Large files rejected - Error: "File size exceeds XMB limit" **Root Cause:** Upload size limit too small **Resolution:** ```bash # Backend .env MAX_UPLOAD_SIZE_MB=200 # Increase if needed # Also check Apache/Nginx limits # Apache: LimitRequestBody 209715200 # 200MB in bytes # Nginx: client_max_body_size 200M; # Restart services sudo systemctl restart brief-extractor apache2 ``` --- ## Advanced Topics ### Multi-Model Consolidation Algorithm **Normalization Phase:** ```python # 1. Title Normalization # Input: "1234 - Location A", "Store B - Hero Slider" # Output: "Wholesale - Hero Slider (Campaign)" # 2. Category Normalization # Input: "Paid Social", "Social Media - Paid" # Output: "Paid Social" # 3. Specifications Normalization # Input: ["1080 × 1080", "1080x1080 px"] # Output: ["1080x1080"] ``` **Deduplication Strategy:** ```python # Build deduplication key key = ( normalized_title, normalized_category, media, tuple(sorted(technical_specifications)), asset_type ) # Merge assets with same key merged_asset = { **base_asset, 'technical_specifications': union(specs1, specs2), 'language_country_market': union(markets1, markets2), 'quantity': max(qty1, qty2) } ``` **Quality Enhancement:** ```python # For each field, choose best value from all models reference_material = longest([model1.ref, model2.ref, model3.ref]) creative_direction = most_detailed([model1.dir, model2.dir, model3.dir]) ``` ### Custom Prompt Engineering **Prompt Structure:** ``` prompts/ ├── system_multi_perspective.txt # System message for analysis ├── multi_perspective_analysis.txt # User prompt template ├── consolidation_analysis.txt # Consolidation strategy └── universal_schema.json # Output schema ``` **Customization:** ```bash # Edit prompts nano prompts/multi_perspective_analysis.txt # Changes take effect immediately (loaded at runtime) # No need to restart backend ``` **Prompt Variables:** ```python # multi_perspective_analysis.txt # Uses: {doc_type}, {document_content} # consolidation_analysis.txt # Uses: {models_results} ``` ### Adding New AI Models **Step 1: Create Provider Class** ```python # core/llm_service/new_provider.py class NewProvider(BaseLLMProvider): async def generate_response(self, messages, schema, **kwargs): # Implementation pass ``` **Step 2: Register in Provider Manager** ```python # core/llm_service/provider_manager.py elif provider_name == 'newprovider': return NewProvider(model_name=model_name) ``` **Step 3: Add Configuration** ```python # core/config.py MODEL_MAPPINGS = { 'newprovider-model1': ('newprovider', 'model-1'), } PRICING = { 'newprovider-model1': { 'input': 1.00, 'output': 3.00 } } ``` **Step 4: Update Frontend** ```typescript // server/jobs/manager.py model_info_map = { 'newprovider-model1': ModelInfo( key='newprovider-model1', name='New Model', provider='NewProvider', ... ) } ``` --- ## Appendix ### File Location Reference **Configuration Files:** - Backend config: `/adi-o3-multipass/.env` - Frontend config: `/adi-o3-multipass/frontend/.env` - Server config: `/adi-o3-multipass/server/config_runtime.py` - Core config: `/adi-o3-multipass/core/config.py` - Vite config: `/adi-o3-multipass/frontend/vite.config.ts` **Prompt Templates:** - System prompt: `/adi-o3-multipass/prompts/system_multi_perspective.txt` - Analysis prompt: `/adi-o3-multipass/prompts/multi_perspective_analysis.txt` - Consolidation prompt: `/adi-o3-multipass/prompts/consolidation_analysis.txt` - Schema: `/adi-o3-multipass/prompts/universal_schema.json` **Data Directories (Production):** - Uploads: `/var/www/html/brief-extractor/backend/server/data/uploads/` - Outputs: `/var/www/html/brief-extractor/backend/server/data/outputs/` - Debug files: `/tmp/consolidation_debug_*.json`, `/tmp/openai_debug_*.txt` **Logs:** - Systemd journal: `journalctl -u brief-extractor` - Application logs: Structured JSON to stdout (captured by systemd) ### Port Reference | Service | Port | Protocol | Purpose | |---------|------|----------|---------| | Frontend Dev Server | 3000 | HTTP | Local development | | Backend Dev Server | 8000 | HTTP | Local development | | Backend Production | 8002 | HTTP | Production (behind Apache) | | Apache Web Server | 443 | HTTPS | Public-facing | ### Key URLs **Production:** - Frontend: `https://ai-sandbox.oliver.solutions/brief-extractor/` - Backend API: `https://ai-sandbox.oliver.solutions/brief-extractor-back/api` - Backend WS: `wss://ai-sandbox.oliver.solutions/brief-extractor-back/ws` - Health Check: `https://ai-sandbox.oliver.solutions/brief-extractor-back/health` **Development:** - Frontend: `http://localhost:3000` - Backend API: `http://localhost:8000/api` - Backend WS: `ws://localhost:8000/ws` - Health Check: `http://localhost:8000/health` ### External Service Dependencies | Service | Purpose | API Key Variable | Endpoint | |---------|---------|------------------|----------| | OpenAI | GPT-5 model access | `OPENAI_API_KEY` | `https://api.openai.com/v1/responses` | | Anthropic | Claude models | `ANTHROPIC_API_KEY` | `https://api.anthropic.com/v1/messages` | | Google AI | Gemini models | `GOOGLE_API_KEY` | `https://generativelanguage.googleapis.com` | | LlamaCloud | Document parsing | `LLAMACLOUD_API_KEY` | `https://api.cloud.llamaindex.ai` | | Microsoft Azure AD | Authentication | (MSAL config) | `https://login.microsoftonline.com` | ### Version Compatibility **Minimum Versions:** - Python: 3.13+ - Node.js: 18+ - OpenAI library: 1.0.0+ (for responses API) - Anthropic library: 0.67.0+ (for async client) - Google GenAI: 0.4.0+ (for new SDK) **Browser Support:** - Chrome/Edge: 90+ - Firefox: 88+ - Safari: 14+ - WebSocket support required --- ## Security Best Practices ### Production Checklist - [ ] `DEV_MODE=false` in backend `.env` - [ ] `SECURE_COOKIES=true` in backend `.env` - [ ] `HTTPS_ONLY=true` in backend `.env` - [ ] Strong `SESSION_SECRET` (min 32 random chars) - [ ] ALLOWED_ORIGINS restricted to production domain - [ ] Apache/Nginx configured with SSL certificates - [ ] WebSocket proxy configured with SSL - [ ] File upload size limits enforced - [ ] CORS properly configured - [ ] API keys rotated regularly - [ ] Systemd service runs as restricted user (www-data) - [ ] File permissions: uploads/outputs not world-readable - [ ] Regular security updates (pip, npm) - [ ] Monitoring and alerting configured - [ ] Backup strategy for job data (if needed) ### Data Privacy Considerations **User Data:** - User emails/names from Azure AD (PII) - Uploaded documents may contain confidential business information - Generated CSVs contain extracted marketing data **Retention Policy:** - Jobs auto-deleted after `FILE_RETENTION_HOURS` (24h default) - Uploaded files and CSVs deleted with job - Logs may contain user identifiers (consider log retention) **GDPR Compliance:** - User data isolated per `user_id` - Automatic data deletion (24h retention) - Right to deletion: DELETE /api/jobs/{id} - Data portability: CSV download - Audit trail: Structured logs with user_id --- ## Change Log ### Version 2.0 (October 2025) **New Features:** - Multi-tenant architecture with user isolation - Microsoft Azure AD authentication (MSAL) - Real-time WebSocket communication - Enhanced logging with conditional verbosity - Improved error handling and debug artifacts **Improvements:** - PKCE authentication flow (more secure) - Parallel model execution (2-3x faster) - Smart WebSocket reconnection - Comprehensive API documentation **Bug Fixes:** - GPT-5 consolidation empty response (OpenAI library upgrade) - WebSocket authentication with query parameters - Job phase serialization (enum vs string handling) - Frontend base path configuration for subpath deployment ### Version 1.0 (September 2025) - Initial release with multi-model analysis - LlamaParser integration - Basic web interface - CLI support --- ## Support and Contact **Documentation Updates:** This document should be updated whenever: - Architecture changes are made - New features are added - Configuration options change - Deployment procedures are modified **Getting Help:** - Review CLAUDE.md for project overview - Check troubleshooting guide for common issues - Review backend logs: `journalctl -u brief-extractor` - Review frontend console for client-side errors - Check debug artifacts in `/tmp/` for detailed diagnostics --- **Document End**