michael dc43078e08 changes required for deployment on Oliver web server - various bug fixes, MSAL adjustments, etc.

2025-10-08 14:03:40 -05:00

94 KiB

Raw Permalink Blame History

Brief Extractor - Comprehensive Technical Documentation v2.0

Document Version: 2.0 Last Updated: October 7, 2025 Application Version: 1.0.0 Author: Technical Documentation Team

Executive Summary
System Architecture Overview
Backend Architecture
Frontend Architecture
Data Flow and Processing Pipeline
Authentication and Security
WebSocket Real-Time Communication
API Reference
Data Models and Schemas
Configuration Management
Deployment Architecture
Error Handling and Logging
Performance and Scalability
Development Guide
Troubleshooting Guide

Executive Summary

The Brief Extractor is an enterprise-grade, multi-tenant document analysis platform that leverages multiple cutting-edge AI models (OpenAI GPT-5, Anthropic Claude, Google Gemini) in parallel to extract structured marketing asset information from unstructured creative briefs and presentations.

Key Features

Multi-Model AI Analysis: Parallel processing using 3+ AI models simultaneously for comprehensive data extraction
Intelligent Consolidation: Advanced deduplication and merging of multi-model results
Real-Time Progress Tracking: WebSocket-based live updates with provider-specific progress reporting
Enterprise Authentication: Microsoft Azure AD (MSAL) SSO integration with PKCE flow
Multi-Tenant Architecture: Complete user isolation with per-user job queuing and data segregation
Scalable Processing: Asynchronous job queue with configurable concurrency limits
Production-Ready: Comprehensive error handling, logging, monitoring, and recovery mechanisms

Technology Stack

Backend:

Framework: Quart (async Python web framework)
AI Models: OpenAI GPT-5, Anthropic Claude Opus 4.1/Sonnet 4, Google Gemini 2.5 Pro
Document Processing: LlamaParser cloud service for OCR and extraction
Authentication: MSAL (Microsoft Authentication Library) with JWT validation
Real-Time: WebSocket with automatic reconnection and health monitoring
Data Storage: File-based storage with automatic cleanup and retention policies

Frontend:

Framework: React 18 with TypeScript
Build Tool: Vite 5 with HMR and optimized production builds
State Management: Zustand for global state, TanStack Query for server state
UI Framework: Tailwind CSS with custom design system
Authentication: MSAL React with Azure AD integration
Real-Time: Native WebSocket client with exponential backoff reconnection

System Architecture Overview

High-Level Architecture

graph TB
    subgraph Browser["Client Browser (Frontend)"]
        React["React Application<br/>- Upload UI<br/>- Queue View<br/>- Authentication"]
        WSClient["WebSocket Client<br/>- Live Updates<br/>- Auto-Reconnect<br/>- Connection Health"]
        React <--> WSClient
    end

    subgraph WebServer["Web Server (Apache/Nginx)"]
        SSL["SSL/TLS Termination"]
        WSProxy["WebSocket Upgrade Proxy"]
        Static["Static File Serving"]
    end

    subgraph Backend["Quart Application Server (Backend)"]
        subgraph API["API Layer"]
            AuthAPI["Auth API"]
            JobsAPI["Jobs API"]
            ConfigAPI["Config API"]
        end

        subgraph Queue["Job Queue System"]
            AsyncQueue["Async Queue"]
            Workers["Workers (5)"]
            Semaphore["Semaphore"]
        end

        subgraph WS["WebSocket Manager"]
            Connections["Connections"]
            Broadcasting["Broadcasting"]
            UserTargeting["User Targeting"]
        end

        subgraph Processing["Job Processing Engine"]
            Extract["Content Extraction<br/>(LlamaParser)"]
            Analysis["Parallel Multi-Model<br/>Analysis"]
            Consolidation["Result Consolidation"]
            CSV["CSV Generation"]
        end
    end

    subgraph External["External Services"]
        OpenAI["OpenAI API<br/>(GPT-5)"]
        Anthropic["Anthropic API<br/>(Claude)"]
        Google["Google AI API<br/>(Gemini)"]
        Llama["LlamaCloud API<br/>(Parsing)"]
    end

    React -->|"HTTPS/REST API"| WebServer
    WSClient -->|"WSS (WebSocket)"| WebServer
    WebServer --> API
    WebServer --> WS
    API --> Queue
    Queue --> Processing
    Processing --> OpenAI
    Processing --> Anthropic
    Processing --> Google
    Processing --> Llama
    Processing --> WS

Component Communication Flow

sequenceDiagram
    actor User
    participant Frontend
    participant API as API Endpoint
    participant JobMgr as Job Manager
    participant Queue as Job Queue
    participant Worker as Worker Pool
    participant LLM as LLM Services
    participant WS as WebSocket

    User->>Frontend: Upload File
    Frontend->>API: POST /api/jobs
    API->>JobMgr: create_job()
    JobMgr->>JobMgr: Save file to disk
    JobMgr->>Queue: Enqueue job_id
    JobMgr->>WS: Broadcast job.created
    WS->>Frontend: Job created event
    API->>Frontend: Job[] response

    Queue->>Worker: Pull job_id
    Worker->>Worker: Phase 1: Extract Content
    Worker->>LLM: LlamaParser API
    LLM->>Worker: Markdown content
    Worker->>WS: Progress update (25%)
    WS->>Frontend: job.progress event

    Worker->>Worker: Phase 2: Parallel Analysis
    par Parallel Execution
        Worker->>LLM: OpenAI GPT-5
        Worker->>LLM: Anthropic Claude
        Worker->>LLM: Google Gemini
    end
    LLM->>Worker: All results
    Worker->>WS: Progress update (75%)
    WS->>Frontend: job.progress event

    Worker->>Worker: Phase 3: Consolidation
    Worker->>LLM: Consolidation model
    LLM->>Worker: Merged results
    Worker->>WS: Progress update (90%)

    Worker->>Worker: Phase 4: Generate CSV
    Worker->>WS: job.completed
    WS->>Frontend: Completion event
    Frontend->>User: Show download button

Deployment Architecture

Production Deployment:

Frontend: https://ai-sandbox.oliver.solutions/brief-extractor/
Backend API: https://ai-sandbox.oliver.solutions/brief-extractor-back/api
Backend WebSocket: wss://ai-sandbox.oliver.solutions/brief-extractor-back/ws

Development Environment:

Frontend: http://localhost:3000
Backend API: http://localhost:8000/api
Backend WebSocket: ws://localhost:8000/ws

Backend Architecture

Technology Stack

Core Framework: Quart 0.19+ (async Python web framework based on Flask API)

Chosen for native async/await support required for parallel LLM calls
ASGI-based for WebSocket support
Compatible with Hypercorn ASGI server

Key Dependencies:

quart - Async web framework
quart-cors - CORS middleware for cross-origin requests
openai>=1.0.0 - OpenAI GPT-5 client with responses API
anthropic>=0.67.0 - Anthropic Claude client with async support
google-genai[aiohttp]>=0.4.0 - Google Gemini client with aiohttp
llama-cloud-services>=0.6.62 - LlamaParser document extraction
msal>=1.24.0 - Microsoft Authentication Library
PyJWT>=2.8.0 - JWT token validation
structlog - Structured logging for production environments
python-dotenv - Environment variable management
pydantic>=2.0.0 - Data validation and schema definition

Directory Structure

graph LR
    subgraph Server["server/ - Backend Application"]
        App["app.py<br/>Main application"]
        Config["config_runtime.py<br/>Runtime config"]

        subgraph API["api/ - REST Endpoints"]
            AuthAPI["auth.py<br/>/api/auth/*"]
            ConfigAPI["config.py<br/>/api/config/*"]
            JobsAPI["jobs.py<br/>/api/jobs/*"]
        end

        subgraph Auth["auth/ - Authentication"]
            MSAL["msal_auth.py<br/>JWT validation"]
            Middleware["middleware.py<br/>Decorators"]
        end

        subgraph Jobs["jobs/ - Job Management"]
            Models["models.py<br/>Data models"]
            Manager["manager.py<br/>Singleton registry"]
            Storage["storage.py<br/>File operations"]
        end

        subgraph Runners["runners/ - Execution"]
            JobRunner["job_runner.py<br/>Workers"]
            EnhancedAnalyzer["enhanced_analyzer.py<br/>Progress hooks"]
            Progress["progress.py<br/>Reporting"]
        end

        subgraph WS["ws/ - WebSocket"]
            WSManager["manager.py<br/>Connections"]
        end
    end

    subgraph Core["core/ - Processing Engine"]
        CoreConfig["config.py<br/>Model config"]
        ProcessBrief["process_brief_enhanced.py<br/>DocumentAnalyzer"]
        Consolidation["consolidation_processor.py<br/>Result merging"]

        subgraph LLMService["llm_service/ - Providers"]
            Base["base_provider.py<br/>Abstract interface"]
            OpenAI["openai_provider.py<br/>GPT-5"]
            Anthropic["anthropic_provider.py<br/>Claude"]
            GoogleProv["google_provider.py<br/>Gemini"]
            ProvManager["provider_manager.py<br/>Parallel coordinator"]
        end
    end

    App --> API
    App --> Auth
    App --> Jobs
    App --> Runners
    App --> WS
    Runners --> Core

    style Server fill:#e3f2fd
    style Core fill:#e8f5e9

Core Components

1. Application Factory (`server/app.py`)

Purpose: Creates and configures the Quart application with all routes, middleware, and lifecycle hooks.

Key Responsibilities:

Register API blueprints (auth_bp, config_bp, jobs_bp)
Configure CORS for cross-origin requests
Initialize WebSocket manager and job queue
Set up application lifecycle (before_serving, after_serving)
Start/stop background worker tasks
Define health check and WebSocket endpoints
Configure error handlers (400, 401, 403, 404, 413, 500)

Lifecycle Management:

@app.before_serving
async def startup():
    # Start WebSocket background tasks (ping, cleanup)
    await ws_manager.start_background_tasks()

    # Start job processing workers (configurable count)
    background_workers = await start_background_workers(
        job_manager, ws_manager,
        num_workers=server_config.MAX_CONCURRENT_JOBS
    )

    # Schedule periodic cleanup (hourly)
    cleanup_task = asyncio.create_task(periodic_cleanup(job_manager))

@app.after_serving
async def shutdown():
    # Stop all background workers gracefully
    await stop_background_workers(background_workers)
    await ws_manager.stop_background_tasks()

Critical Configuration:

MAX_CONTENT_LENGTH: File upload size limit (200MB default)
SESSION_SECRET: Used for secure cookie signing
SECURE_COOKIES, HTTPS_ONLY: Security flags for production

2. Job Manager (`server/jobs/manager.py`)

Pattern: Thread-safe Singleton Purpose: Central registry and queue for all processing jobs

Architecture:

classDiagram
    class JobManager {
        <<singleton>>
        -_instance: JobManager
        -jobs: Dict[str, Job]
        -queue: asyncio.Queue
        -processing_semaphore: Semaphore
        -storage: StorageManager
        -_lock: asyncio.Lock
        +create_job(file, user_id) Job
        +get_job(job_id) Job
        +get_user_jobs(user_id) List~Job~
        +delete_job(job_id) bool
        +cleanup_expired_jobs() int
        +serialize_all() List~Dict~
        +get_instance()$ JobManager
    }

    class Job {
        +id: str
        +user_id: str
        +phase: JobPhase
        +progress_pct: int
        +provider_updates: Dict
        +logs: List~LogEntry~
        +model_config: ModelConfiguration
        +update_progress(phase, pct, label)
        +mark_completed(url, summary, path)
        +mark_failed(error)
        +to_dict() Dict
    }

    class StorageManager {
        +upload_dir: Path
        +output_dir: Path
        +save_uploaded_file(data, filename, job_id) str
        +validate_file(filename, size) tuple
        +cleanup_job_files(upload, output)
        +cleanup_expired_files() int
    }

    JobManager --> Job : manages
    JobManager --> StorageManager : uses

Key Operations:

Job Creation Flow:

Validate file (extension, size, name)
Create Job instance with unique UUID
Save file to disk via StorageManager
Add job to in-memory registry
Enqueue job ID for processing
Return job to API endpoint

User Isolation:

Each job tagged with user_id from authenticated token
get_user_jobs() filters by user_id
Users can only see/access their own jobs
WebSocket broadcasts filtered by user

Concurrency Control:

# Semaphore limits concurrent processing
processing_semaphore = asyncio.Semaphore(MAX_CONCURRENT_JOBS)

# Worker acquires semaphore before processing
async with job_manager.processing_semaphore:
    await run_job(job, ws_manager)

Cleanup and Retention:

Periodic cleanup every hour via periodic_cleanup() task
Removes jobs older than FILE_RETENTION_HOURS (24h default)
Cleans up orphaned upload/output files
Preserves active and recent jobs

3. Storage Manager (`server/jobs/storage.py`)

Purpose: Safe file operations with validation and cleanup

Directory Structure:

server/data/
├── uploads/        # Uploaded documents (temporary)
│   └── {job_id}_{sanitized_filename}.{ext}
└── outputs/        # Generated CSV files
    └── {sanitized_basename}-{timestamp}.csv

File Operations:

Validation: Extension whitelist, size limits, filename sanitization
Safe Naming: Job ID prefix to prevent collisions
Async I/O: Uses run_in_executor() to avoid blocking event loop
Automatic Cleanup: Removes files older than retention period

Security Features:

Filename sanitization removes special characters
Length limits prevent path traversal
Extension whitelist: .pdf, .pptx, .docx, .xlsx, .ppt, .doc, .xls
No execution of uploaded files

4. Job Processing Pipeline (`server/runners/job_runner.py`)

Architecture: Background worker pool processing jobs from async queue

Worker Pool:

# Configurable number of workers (default: 5)
workers = []
for i in range(num_workers):
    worker = asyncio.create_task(
        process_job_queue(job_manager, ws_manager),
        name=f"job-worker-{i}"
    )
    workers.append(worker)

Job Execution Flow:

1. Worker pulls job_id from queue (blocking until available)
2. Acquire processing semaphore (concurrency limit)
3. Create ProgressReporter for WebSocket updates
4. Execute run_job(job, ws_manager)
   ├─ Phase 1: Extract content (LlamaParser)
   ├─ Phase 2: Parallel multi-model analysis
   ├─ Phase 3: Consolidate results
   ├─ Phase 4: Generate CSV
   └─ Phase 5: Mark completed/failed
5. Release semaphore
6. Mark queue task as done

Progress Reporting:

Each phase reports progress percentage (0-100)
Provider-specific updates (started, success, error, tokens, cost)
Real-time log streaming to WebSocket clients
Automatic error capture and reporting

5. LLM Service Layer (`core/llm_service/`)

Design Pattern: Provider abstraction with async parallel execution

Provider Hierarchy:

classDiagram
    class BaseLLMProvider {
        <<abstract>>
        +api_key: str
        +model_name: str
        +generate_response(messages, schema)* LLMResponse
        +validate_config()* bool
        +estimate_cost(input, output, cached)* float
        +get_max_tokens()* int
        +prepare_messages(system, user) List
    }

    class OpenAIProvider {
        +reasoning_effort: str
        +timeout: int
        +client: AsyncOpenAI
        +generate_response() LLMResponse
        +set_reasoning_effort(effort)
        -_create_pydantic_model(schema)
        -_save_debug_response()
    }

    class AnthropicProvider {
        +thinking_budget: int
        +temperature: float
        +client: AsyncAnthropic
        +generate_response() LLMResponse
        -_two_call_approach()
        -_convert_to_tool_schema()
    }

    class GoogleProvider {
        +thinking_budget: int
        +temperature: float
        +client: genai.Client
        +generate_response() LLMResponse
        -_convert_to_gemini_schema()
    }

    BaseLLMProvider <|-- OpenAIProvider
    BaseLLMProvider <|-- AnthropicProvider
    BaseLLMProvider <|-- GoogleProvider

Common Interface:

async def generate_response(
    messages: List[Dict[str, str]],
    schema: Optional[Dict[str, Any]] = None,
    **kwargs
) -> LLMResponse

Provider-Specific Features:

OpenAI (openai_provider.py):

Uses client.responses.parse() API for structured output
Configurable reasoning effort: high, medium, low, minimal
Native Pydantic model support via text_format parameter
Automatic retry with exponential backoff (max_retries: 2)
Timeout: 3600 seconds (1 hour) for long documents
Two-stage validation: check output_parsed, fallback to choices[0].message.content

Anthropic (anthropic_provider.py):

Two-call approach due to thinking mode incompatibility with structured output:
1. Call A: Extended thinking with analysis (no forced tools)
2. Call B: Structured JSON formatting (no thinking)
Thinking budget: 12,000 tokens (configurable)
Temperature: 1.0 for creative analysis
Max tokens: 32,000 (Claude Sonnet 4), 64,000 (Claude Opus 4.1)
Schema conversion to Anthropic tool format

Google (google_provider.py):

Uses new google-genai SDK with client.aio async methods
Native thinking support with configurable budget (12,000 tokens)
Schema conversion to Gemini response_schema format
Largest context window: 2M tokens (Gemini 2.5 Pro)
Temperature: 0.7 for balanced creativity/consistency

Provider Manager (provider_manager.py):

Coordinates parallel execution across multiple providers
Uses asyncio.gather() for true concurrent API calls
Implements minimum success threshold (default: 1 model must succeed)
Tracks per-provider timing, tokens, and costs
Handles partial failures gracefully

Parallel Execution Example:

# All models process simultaneously
responses = await provider_manager.execute_parallel_analysis(
    model_keys=['openai-gpt5', 'anthropic-sonnet4', 'google-gemini25'],
    messages=analysis_messages,
    schema=UNIVERSAL_BASE_DELIVERABLE_SCHEMA,
    minimum_success_threshold=1
)

# Total time = slowest model, not sum of all models
# Example: GPT-5 (110s) + Claude (78s) + Gemini (49s) = 110s total (not 237s)

6. Consolidation System (`core/consolidation_processor.py`)

Purpose: Intelligently merge results from multiple AI models into optimal dataset

Consolidation Strategy:

Inclusion Bias: If any model found a legitimate deliverable, include it
Normalization: Canonicalize titles, categories, specifications before deduplication
Smart Deduplication: Merge only when core identity matches (not just similar text)
Quality Enhancement: Combine best specifications from all contributing models

Process Flow:

Input: [GPT-5 Result, Claude Result, Gemini Result]
         │
         ├─ Format results as comparison prompt
         ├─ Load consolidation strategy template
         └─ Execute with consolidation model (GPT-5 or Claude Opus)
                │
                ├─ Returns: Consolidated base deliverables
                ├─ Validation: Ensure 'assets' key present
                └─ Expansion: Generate individual assets from multipliers

Multiplier Expansion:

Base deliverables have multiplier arrays (e.g., 5 sizes × 3 markets = 15 assets)
Uses itertools.product() for Cartesian product expansion
Validates: technical_specifications × language_country_market ≈ quantity

7. Authentication System (`server/auth/`)

Architecture: MSAL-based SSO with development mode bypass

Components:

MSALAuthenticator (msal_auth.py):

Validates JWT tokens from Microsoft Azure AD
Supports PKCE flow (Public Client - no client secret required)
Extracts user claims: oid, preferred_username, name, roles
Token expiration checking via exp claim
Audience validation (accepts Microsoft Graph audience)

Middleware (middleware.py):

@dev_mode_bypass: Creates mock user in development, validates in production
@auth_required: Strict authentication enforcement
@optional_auth: Extracts user if present but doesn't require
get_user_id(): Safely extracts user ID from request context

Token Flow:

1. Frontend obtains token via MSAL.js redirect flow
2. Token stored in localStorage
3. Frontend sends token in Authorization header: "Bearer <jwt>"
4. Backend middleware extracts and validates token
5. User info stored in request context (g.current_user)
6. Endpoints access user via get_user_id()

Development Mode:

DEV_MODE=true: Bypasses MSAL, creates mock user
DEV_MODE=false: Requires valid Microsoft account authentication
Never use DEV_MODE in production!

8. WebSocket Manager (`server/ws/manager.py`)

Pattern: Singleton with async lock for thread safety

Architecture:

class WebSocketClient:
    client_id: UUID            # Unique connection identifier
    user_id: str              # Authenticated user ID (for targeting)
    connected_at: datetime    # Connection timestamp
    last_ping: datetime       # Heartbeat tracking
    websocket: Quart WebSocket object

Connection Lifecycle:

Client connects to /ws?token=<jwt>
Backend validates token and extracts user ID
Create WebSocketClient and register in manager
Send connection.established acknowledgment
Send initial queue.snapshot with user's jobs
Enter message loop (ping/pong, handle client messages)
On disconnect: unregister client

Broadcasting Modes:

broadcast_to_all(): Send to all connected clients
broadcast_to_user(user_id): Send to specific user's connections
broadcast_job_update(job_id): Send job-specific updates (currently broadcasts to all)

Background Tasks:

Ping Loop: Sends ping every 30 seconds to keep connections alive
Cleanup Loop: Removes stale connections (no ping for 90+ seconds)

Message Types:

connection.established: Initial handshake
queue.snapshot: Full job list on connect
job.created: New job added
job.accepted: Job entered processing queue
job.progress: Phase and progress percentage updates
job.provider_update: Per-model status (started, tokens, cost, error)
job.log: Real-time log streaming
job.completed: Processing finished with results
job.failed: Processing error
job.deleted: Job removed
ping/pong: Heartbeat mechanism

9. API Endpoints

Authentication API (/api/auth/*):

Endpoint	Method	Purpose	Auth Required
`/api/auth/config`	GET	Get MSAL configuration for frontend	No
`/api/auth/validate`	POST	Validate access token	No
`/api/auth/user`	GET	Get current user info	Yes (bypass in dev)
`/api/auth/logout`	POST	Get logout URL for MSAL	No

Configuration API (/api/config/*):

Endpoint	Method	Purpose	Auth Required
`/api/config/models`	GET	List available AI models	Yes (bypass in dev)
`/api/config/defaults`	GET	Get default model configuration	Yes (bypass in dev)
`/api/config/estimate`	POST	Estimate processing cost	Yes (bypass in dev)
`/api/config/validate`	POST	Validate model configuration	Yes (bypass in dev)
`/api/config/system`	GET	Get system information	Yes (bypass in dev)

Jobs API (/api/jobs/*):

Endpoint	Method	Purpose	Auth Required
`/api/jobs`	POST	Create new jobs (multipart file upload)	Yes (bypass in dev)
`/api/jobs`	GET	List user's jobs (paginated)	Yes (bypass in dev)
`/api/jobs/{id}`	GET	Get specific job details	Yes (bypass in dev)
`/api/jobs/{id}`	DELETE	Delete job and files	Yes (bypass in dev)
`/api/jobs/{id}/download`	GET	Download CSV result (binary)	Yes (bypass in dev)
`/api/jobs/{id}/logs`	GET	Get job logs (paginated)	Yes (bypass in dev)
`/api/jobs/batch-download`	POST	Download multiple CSVs as ZIP	Yes (bypass in dev)
`/api/jobs/stats`	GET	Get job statistics for user	Yes (bypass in dev)
`/api/jobs/cleanup`	POST	Clean up expired jobs	Yes (bypass in dev)

Health Endpoint:

/health (GET): System health with queue stats, WebSocket connections, config info

WebSocket Endpoint:

/ws (WebSocket): Real-time bidirectional communication with token-based auth

Frontend Architecture

Technology Stack

Core Framework: React 18.2 with TypeScript 5.2

Chosen for component-based architecture and type safety
Concurrent rendering features for smooth UI updates
Strict mode enabled for development

Build System: Vite 5.0

Fast HMR (Hot Module Replacement) for development
Optimized production builds with code splitting
Environment variable injection at build time

State Management:

Zustand 4.4: Global client state (jobs, connection status)
TanStack Query 5.8: Server state management, caching, background refetching
MSAL React 2.1: Authentication state

UI Framework:

Tailwind CSS 3.3: Utility-first styling
Lucide React: Icon system (tree-shakeable)
Custom Components: Reusable UI library

Key Dependencies:

react, react-dom - Core React libraries
@azure/msal-browser, @azure/msal-react - Microsoft authentication
axios - HTTP client with interceptors
zustand - State management
@tanstack/react-query - Server state and caching
lucide-react - Icon components
tailwind-merge, clsx - Dynamic className utilities

Directory Structure

graph LR
    subgraph Frontend["frontend/src/ - React Application"]
        Main["main.tsx<br/>Entry point"]
        App["App.tsx<br/>Root component"]

        subgraph Components["components/"]
            subgraph AuthComp["auth/"]
                AuthProvider["AuthProvider.tsx<br/>MSAL wrapper"]
                AuthGuard["AuthGuard.tsx<br/>Route protection"]
                LoginPage["LoginPage.tsx<br/>Login UI"]
            end

            subgraph UploadComp["upload/"]
                UploadPanel["UploadPanel.tsx<br/>File upload"]
                ModelSelector["ModelSelector.tsx<br/>Model config"]
                CostEstimator["CostEstimator.tsx<br/>Cost preview"]
            end

            subgraph QueueComp["queue/"]
                QueueView["QueueView.tsx<br/>Job list"]
                JobCard["JobCard.tsx<br/>Job summary"]
                JobAccordion["JobAccordion.tsx<br/>Details view"]
                ProviderChips["ProviderChips.tsx<br/>Status badges"]
            end

            subgraph UIComp["ui/"]
                Button["Button.tsx"]
                Card["Card.tsx"]
                ProgressBar["ProgressBar.tsx"]
            end

            Dashboard["Dashboard.tsx<br/>Main layout"]
        end

        subgraph Services["services/"]
            APIClient["api.ts<br/>Axios client"]
            WSClient["websocket.ts<br/>WS client"]
        end

        subgraph Stores["store/"]
            AuthStore["authStore.ts<br/>Auth state"]
            JobStore["jobStore.ts<br/>Job state"]
        end

        subgraph Hooks["hooks/"]
            UseJobs["useJobs.ts"]
            UseWS["useWebSocket.ts"]
        end

        Types["types/api.ts<br/>TypeScript defs"]
    end

    Main --> App
    App --> Components
    Components --> Services
    Components --> Stores
    Components --> Hooks
    Services --> Types
    Stores --> Types

    style Frontend fill:#fff3e0
    style Components fill:#e1f5fe
    style Services fill:#f3e5f5
    style Stores fill:#e8f5e9

State Management Architecture

Authentication Store (`store/authStore.ts`)

Zustand Store with Persistence:

interface AuthState {
    isAuthenticated: boolean
    user: User | null
    authConfig: AuthConfig | null
    isLoading: boolean
    error: string | null

    // Actions
    login(accessToken: string): Promise<void>
    logout(): Promise<void>
    checkAuth(): Promise<void>
}

Key Features:

Persists to localStorage (excludes sensitive data)
Automatic token validation on mount
Handles MSAL redirect responses
Manages logout flow with Azure AD

Login Flow:

stateDiagram-v2
    [*] --> Unauthenticated
    Unauthenticated --> InitAuth: Load app
    InitAuth --> GetConfig: GET /api/auth/config
    GetConfig --> LoginScreen: Show login page

    LoginScreen --> MSALRedirect: User clicks login
    MSALRedirect --> AzureLogin: Redirect to Microsoft
    AzureLogin --> Authenticating: User enters credentials
    Authenticating --> MFACheck: Credentials valid
    MFACheck --> TokenExchange: MFA complete (if required)

    TokenExchange --> RedirectBack: Auth code received
    RedirectBack --> HandleRedirect: MSAL.handleRedirectPromise()
    HandleRedirect --> ValidateToken: POST /api/auth/validate
    ValidateToken --> Authenticated: Token valid
    ValidateToken --> LoginScreen: Token invalid

    Authenticated --> ConnectWS: Connect WebSocket
    ConnectWS --> Dashboard: Show main UI
    Dashboard --> [*]

    note right of TokenExchange
        PKCE flow
        No client secret
    end note

    note right of ValidateToken
        Backend checks:
        - exp claim
        - aud claim
        - Extracts user ID
    end note

Job Store (`store/jobStore.ts`)

Zustand Store for Job Queue:

interface JobState {
    jobs: Record<string, Job>              // Job registry by ID
    connectionStatus: 'connecting' | 'connected' | 'disconnected' | 'error'
    selectedModels: ModelConfiguration | null
    availableModels: ModelInfo[]

    // Job Management
    addJob(job: Job): void
    updateJob(id: string, updates: Partial<Job>): void
    updateProvider(jobId, modelKey, update): void
    addLog(jobId, logEntry): void
    removeJob(id: string): void

    // WebSocket Connection
    connectWebSocket(): void
    disconnectWebSocket(): void
    setConnectionStatus(status): void

    // Model Configuration
    loadAvailableModels(): Promise<void>
    loadDefaultConfig(): Promise<void>

    // Selectors
    getActiveJobs(): Job[]
    getCompletedJobs(): Job[]
    getFailedJobs(): Job[]
    getJobsByStatus(status): Job[]
}

WebSocket Integration:

Sets up event handlers for all WebSocket message types
Updates job state in real-time as messages arrive
Manages connection status for UI indicators
Automatically reconnects on disconnect

Component Architecture

Dashboard Component (`components/Dashboard.tsx`)

Layout Structure:

graph TB
    subgraph Dashboard["Dashboard Layout"]
        subgraph Header["Header Bar"]
            Logo["Logo + Title"]
            ConnStatus["Connection Status<br/>Indicator"]
            Stats["Quick Stats<br/>(Active/Complete/Failed)"]
            UserMenu["User Info + Logout"]
        end

        subgraph Main["Main Content Area"]
            subgraph Upload["Upload Panel"]
                FileSelect["Multi-file Selection<br/>(Drag & Drop)"]
                ModelConfig["Model Configuration<br/>(Primary + Consolidation)"]
                CostEst["Cost Estimation<br/>(Real-time)"]
            end

            subgraph Queue["Queue View"]
                Active["Active Jobs<br/>(Progress Bars)"]
                Complete["Completed Jobs<br/>(Download Links)"]
                Failed["Failed Jobs<br/>(Error Details)"]
                Batch["Batch Actions<br/>(Multi-download)"]
            end
        end

        subgraph Footer["Footer Bar"]
            Version["Version Info"]
            PoweredBy["AI Model Credits"]
            UserInfo["Current User"]
        end
    end

    Header --> Main
    Main --> Footer
    Upload -.-> Queue

    style Header fill:#f5f5f5
    style Upload fill:#e3f2fd
    style Queue fill:#e8f5e9
    style Footer fill:#f5f5f5

Real-Time Features:

Connection status indicator with manual reconnect button
Live job count badges (processing, completed, failed)
Auto-refresh on WebSocket disconnect fallback

Upload Panel (`components/upload/UploadPanel.tsx`)

Features:

Multi-file drag-and-drop with validation
Model selector with primary + consolidation configuration
Real-time cost estimation before upload
Progress indication during upload
File size and type validation

Upload Workflow:

stateDiagram-v2
    [*] --> FileSelection: User drops/selects files
    FileSelection --> Validation: Files selected
    Validation --> ModelConfig: Validation passed
    Validation --> Error: Validation failed
    Error --> FileSelection: Fix issues

    ModelConfig --> CostEstimate: Models configured
    CostEstimate --> ConfirmUpload: Review cost
    ConfirmUpload --> Uploading: User confirms

    Uploading --> CreateJobs: POST /api/jobs
    CreateJobs --> JobCreated: Backend creates jobs
    JobCreated --> QueueUpdate: WebSocket job.created
    QueueUpdate --> [*]: Jobs in queue

    note right of Validation
        Size: max 200MB
        Extensions: .pdf, .pptx, .docx, .xlsx
    end note

    note right of CostEstimate
        Real-time calculation
        Based on file size + model selection
    end note

Queue View (`components/queue/QueueView.tsx`)

Display Sections:

Active Jobs: Real-time progress bars, phase indicators, provider chips
Completed Jobs: Summary stats, download button, expansion details
Failed Jobs: Error messages, retry capability (future feature)

Job Card Features:

Expandable accordion for detailed view
Provider-specific status chips (color-coded)
Real-time log streaming in expanded view
Progress percentage with phase labels
Token usage and cost display

WebSocket Client (`services/websocket.ts`)

Features:

Automatic reconnection with exponential backoff
Connection health monitoring via ping/pong
Token-based authentication via query parameter
Event-driven message handling
Window focus/visibility detection for smart reconnection

Reconnection Strategy:

stateDiagram-v2
    [*] --> Disconnected
    Disconnected --> Attempt1: Wait 5s
    Attempt1 --> Connected: Success
    Attempt1 --> Attempt2: Failed

    Attempt2 --> Connected: Success
    Attempt2 --> Attempt3: Failed (wait 10s)

    Attempt3 --> Connected: Success
    Attempt3 --> GaveUp: Failed (wait 20s)

    GaveUp --> GaveUp: Stop retrying
    Connected --> Disconnected: Connection lost

    GaveUp --> Attempt1: Window focus

    note right of Attempt1
        Initial delay: 5s
        Max attempts: 3
    end note

    note right of Attempt2
        Exponential backoff
        10s delay
    end note

    note right of Attempt3
        Final attempt
        20s delay
    end note

    note right of GaveUp
        Manual reconnect only
        Health check every 30s
    end note

Authentication:

// WebSocket URL with token
wss://domain.com/ws?token=<jwt_access_token>

// Backend extracts token from query param
token = websocket.args.get('token')
user_info = await msal_auth.validate_token(token)
client = register_client(user_info['oid'])

Data Flow and Processing Pipeline

Complete Processing Flow

flowchart TD
    Start([User Uploads File]) --> Upload[1. FILE UPLOAD<br/>POST /api/jobs multipart/form-data<br/>Files + modelConfig]

    Upload --> JobCreate[2. JOB CREATION Backend<br/>- Validate files<br/>- Create Job UUID<br/>- Save to disk<br/>- Add to registry<br/>- Enqueue job_id<br/>- Broadcast: job.created]

    JobCreate --> WorkerPull[3. WORKER POOL<br/>Pull from queue<br/>Acquire semaphore]

    WorkerPull --> Extract[4. STAGE 1: CONTENT EXTRACTION<br/>Phase: EXTRACT_CONTENT<br/>- LlamaParser API call<br/>- OCR + table detection<br/>- Returns markdown<br/>Progress: 25%]

    Extract --> ParallelAnalysis[5. STAGE 2: PARALLEL ANALYSIS<br/>Phase: LLM_ANALYSIS<br/>asyncio.gather simultaneous]

    ParallelAnalysis --> GPT5[OpenAI GPT-5<br/>Reasoning: medium<br/>~110 seconds]
    ParallelAnalysis --> Claude[Anthropic Sonnet 4<br/>Two-call approach<br/>~78 seconds]
    ParallelAnalysis --> Gemini[Google Gemini 2.5<br/>Thinking enabled<br/>~49 seconds]

    GPT5 --> Gather[Collect Results<br/>Total time = slowest model 110s<br/>Progress: 75%]
    Claude --> Gather
    Gemini --> Gather

    Gather --> Consolidate[6. STAGE 3: CONSOLIDATION<br/>Phase: CONSOLIDATION<br/>- Format model results<br/>- Load strategy template<br/>- Execute consolidation model<br/>- Smart deduplication<br/>- Validate 'assets' key<br/>Progress: 80%]

    Consolidate --> Expand[7. STAGE 4: MULTIPLIER EXPANSION<br/>- Extract multiplier arrays<br/>- Cartesian product<br/>- 3 sizes × 5 markets = 15 assets<br/>- Validate quantity]

    Expand --> CSVGen[8. STAGE 5: CSV GENERATION<br/>Phase: CSV_GENERATION<br/>- Convert to CSV rows<br/>- Async file write<br/>- Create JobSummary<br/>- Mark COMPLETED<br/>Progress: 100%]

    CSVGen --> WSUpdate[9. WEBSOCKET UPDATE<br/>Broadcast: job.completed<br/>- resultCsvUrl<br/>- summary data]

    WSUpdate --> UIUpdate[10. FRONTEND UPDATE<br/>- JobStore updates<br/>- UI re-renders<br/>- Download button active<br/>- Summary displayed]

    UIUpdate --> End([Processing Complete])

    style ParallelAnalysis fill:#e1f5ff
    style GPT5 fill:#10a37f
    style Claude fill:#d4a373
    style Gemini fill:#4285f4
    style Gather fill:#e1f5ff

Parallel Processing Optimization

gantt
    title Processing Time Comparison
    dateFormat X
    axisFormat %S

    section Sequential
    GPT-5 (110s)     :0, 110
    Claude (78s)     :110, 188
    Gemini (49s)     :188, 237
    Total 237s       :milestone, 237, 237

    section Parallel
    GPT-5 (110s)     :0, 110
    Claude (78s)     :0, 78
    Gemini (49s)     :0, 49
    Total 110s       :milestone, 110, 110

Performance Gain:

Sequential: 237 seconds (sum of all models)
Parallel: 110 seconds (max of all models)
Speedup: 2.15x faster

Implementation:

# Create tasks for all models
tasks = [
    asyncio.create_task(openai_provider.generate_response(...)),
    asyncio.create_task(anthropic_provider.generate_response(...)),
    asyncio.create_task(google_provider.generate_response(...))
]

# Execute all concurrently
results = await asyncio.gather(*tasks, return_exceptions=True)

# Process results (all models complete at ~same time)

Authentication and Security

Microsoft Azure AD Integration

Authentication Flow: PKCE (Proof Key for Code Exchange) - Public Client Flow

Why PKCE:

No client secret required (secure for SPAs)
More secure than implicit flow
Recommended by Microsoft for browser-based apps
Frontend-initiated, backend validates

Complete Authentication Flow:

1. Frontend Initialization:
   GET /api/auth/config
   └─ Returns: { clientId, authority, redirectUri, devMode }

2. User Clicks "Sign in with Microsoft":
   MSAL.loginRedirect({
       scopes: ['openid', 'profile', 'User.Read'],
       redirectUri: 'https://domain.com/brief-extractor/'
   })

3. Redirect to Microsoft Login:
   User authenticates with work/school account
   MFA if configured in Azure AD

4. Microsoft Redirects Back:
   https://domain.com/brief-extractor/?code=...&state=...

5. MSAL Exchanges Code for Token:
   - Frontend MSAL library handles token exchange
   - Receives access token (JWT)
   - Token valid for ~1 hour

6. Frontend Validates Token:
   POST /api/auth/validate
   Body: { "accessToken": "<jwt>" }
   └─ Backend validates:
      - JWT signature (future: using Azure JWKS)
      - Expiration (exp claim)
      - Audience (aud claim)
      - Returns user info if valid

7. Store Token:
   localStorage.setItem('accessToken', token)

8. All Subsequent Requests:
   Authorization: Bearer <jwt>
   ├─ API requests: Axios interceptor adds header
   └─ WebSocket: Query parameter ?token=<jwt>

Security Mechanisms

Transport Security:

HTTPS enforced in production (HTTPS_ONLY=true)
WSS (WebSocket Secure) for real-time communication
Secure cookies in production (SECURE_COOKIES=true)

Authentication Security:

JWT token validation on every request
Token expiration enforcement (Azure AD TTL: ~1 hour)
No client secrets in frontend code (PKCE flow)
Automatic logout on 401 responses

Authorization:

User ID extraction from validated JWT (oid claim)
All jobs tagged with user_id
API endpoints filter by user_id
Users cannot access other users' jobs/files

Input Validation:

File extension whitelist
File size limits (200MB default)
Filename sanitization (remove special chars)
Request payload size limits
CORS restrictions to allowed origins

Data Isolation:

Per-user job filtering in all endpoints
WebSocket broadcasts filtered by user_id
File storage uses job-specific UUIDs
No shared state between users

Development Mode Security:

# DEV_MODE bypasses authentication - NEVER use in production!
if DEV_MODE:
    # Creates mock user without validation
    return {'oid': 'dev-user-id', 'name': 'Development User'}
else:
    # Full JWT validation required
    validate_token(access_token)

Azure AD Configuration Requirements

App Registration Settings:

Platform: Single-page application
Redirect URI: https://ai-sandbox.oliver.solutions/brief-extractor/
Supported Account Types: Single tenant or multi-tenant
API Permissions: Microsoft Graph → User.Read (delegated)
Token Configuration: ID tokens enabled for implicit flow

Required Environment Variables (Backend):

MSAL_CLIENT_ID=<from Azure Portal>
MSAL_TENANT_ID=<from Azure Portal>
MSAL_AUTHORITY=https://login.microsoftonline.com/<tenant_id>
MSAL_REDIRECT_URI=https://ai-sandbox.oliver.solutions/brief-extractor/
DEV_MODE=false  # CRITICAL: Must be false in production

WebSocket Real-Time Communication

Architecture

Connection Model: Persistent bidirectional communication

Uses native WebSocket API (browser) and Quart websocket (server)
One connection per user session (can have multiple tabs = multiple connections)
Automatic reconnection on network failures

Message Protocol

Message Structure:

{
  "type": "message_type",
  "timestamp": "2025-10-07T17:45:08.015Z",
  "jobId": "uuid",
  "...": "message-specific fields"
}

Message Types (Server → Client):

Connection Management:

{
  "type": "connection.established",
  "clientId": "uuid",
  "userId": "user-oid",
  "connectedAt": "2025-10-07T17:40:00.000Z"
}

Queue Snapshot (sent on connect):

{
  "type": "queue.snapshot",
  "jobs": [Job, Job, ...]  // All user's jobs
}

Job Lifecycle:

// Job created
{
  "type": "job.created",
  "job": {Job object}
}

// Job accepted into queue
{
  "type": "job.accepted",
  "jobId": "uuid"
}

// Progress update
{
  "type": "job.progress",
  "jobId": "uuid",
  "phase": "LLM_ANALYSIS",
  "progressPct": 45,
  "stepLabel": "Analyzing with Claude Sonnet 4",
  "providerUpdates": {
    "openai-gpt5": {
      "status": "success",
      "tokensIn": 5000,
      "tokensOut": 3000,
      "costUsd": 0.045,
      "latencyMs": 85000
    }
  }
}

// Provider-specific update
{
  "type": "job.provider_update",
  "jobId": "uuid",
  "modelKey": "anthropic-sonnet4",
  "update": {
    "provider": "anthropic",
    "model": "claude-sonnet-4-20250514",
    "status": "success",
    "tokensIn": 6000,
    "tokensOut": 2500,
    "costUsd": 0.055,
    "latencyMs": 78000
  }
}

// Real-time log entry
{
  "type": "job.log",
  "jobId": "uuid",
  "logEntry": {
    "timestamp": "2025-10-07T17:42:46.474Z",
    "level": "INFO",
    "message": "Consolidation completed: 9 base deliverables"
  }
}

// Job completion
{
  "type": "job.completed",
  "jobId": "uuid",
  "resultCsvUrl": "/api/jobs/{uuid}/download",
  "summary": {
    "docType": "presentation",
    "assetsExtracted": 19,
    "confidenceScore": 0.95,
    "costUsdTotal": 0.2759,
    "tokensTotal": 41322,
    "processingTimeSeconds": 291.1
  }
}

// Job failure
{
  "type": "job.failed",
  "jobId": "uuid",
  "error": "Consolidation failed: Response missing 'assets' key"
}

// Job deleted
{
  "type": "job.deleted",
  "jobId": "uuid"
}

Heartbeat:

// Server → Client (every 30s)
{
  "type": "ping",
  "timestamp": "2025-10-07T17:45:00.000Z"
}

// Client → Server
{
  "type": "pong"
}

Frontend WebSocket Implementation

Connection Management (services/websocket.ts):

class WebSocketClient {
  private ws: WebSocket | null
  private reconnectInterval: 5000ms (initial)
  private maxReconnectInterval: 60000ms
  private maxReconnectAttempts: 3

  connect() {
    const wsUrl = `${VITE_WS_URL}/ws?token=${accessToken}`
    this.ws = new WebSocket(wsUrl)

    this.ws.onopen = () => {
      // Reset reconnection counters
      // Start ping interval (30s)
      // Notify connection handlers
    }

    this.ws.onmessage = (event) => {
      const message = JSON.parse(event.data)
      this.handleMessage(message)  // Route to event handlers
    }

    this.ws.onclose = () => {
      // Stop ping interval
      // Schedule reconnection (if not intentional)
    }
  }

  scheduleReconnect() {
    if (reconnectAttempts >= maxReconnectAttempts) {
      // Stop trying after 3 attempts
      return
    }

    setTimeout(() => {
      this.connect()
      this.reconnectInterval *= 2  // Exponential backoff
    }, this.reconnectInterval)
  }
}

State Updates via Zustand:

// Event handlers registered in jobStore
wsClient.on('job.progress', (message) => {
  updateJob(message.jobId, {
    phase: message.phase,
    progressPct: message.progressPct,
    stepLabel: message.stepLabel,
    providerUpdates: message.providerUpdates
  })
  // React components re-render automatically
})

Connection Resilience

Auto-Reconnection Scenarios:

Network interruption
Server restart
Apache/Nginx reload
Temporary backend unavailability

Smart Reconnection:

Window focus event: Reset attempts, immediate reconnect
Page visibility change: Reduce penalty, try reconnect
Health check: Ping /health every 30s when disconnected
Connection restoration: Resume from last state

Fallback Without WebSocket:

App remains fully functional
Users can still upload, view, download
Progress updates require manual page refresh
No real-time log streaming (logs available on demand)

API Reference

Authentication API

`GET /api/auth/config`

Get MSAL configuration for frontend initialization.

Request: None (no auth required)

Response:

{
  "config": {
    "clientId": "9079054c-9620-4757-a256-23413042f1ef",
    "authority": "https://login.microsoftonline.com/e519c2e6-bc6d-4fdf-8d9c-923c2f002385",
    "redirectUri": "https://ai-sandbox.oliver.solutions/brief-extractor/",
    "devMode": false
  },
  "devMode": false
}

`POST /api/auth/validate`

Validate an access token and return user information.

Request:

{
  "accessToken": "eyJ0eXAiOiJKV1QiLCJhbGc..."
}

Response (Success):

{
  "valid": true,
  "user": {
    "id": "38abcbd2-7558-4f64-aec2-fafc7807552c",
    "username": "user@domain.com",
    "name": "User Name",
    "roles": ["user"]
  }
}

Response (Invalid):

{
  "valid": false,
  "error": "invalid_token",
  "message": "Token is invalid or expired"
}

`GET /api/auth/user`

Get current authenticated user information.

Headers: Authorization: Bearer <token>

Response:

{
  "user": {
    "id": "38abcbd2-7558-4f64-aec2-fafc7807552c",
    "username": "user@domain.com",
    "name": "User Name",
    "roles": ["user"]
  }
}

`POST /api/auth/logout`

Get logout URL for proper Microsoft session termination.

Request:

{
  "redirectUri": "https://ai-sandbox.oliver.solutions/brief-extractor/"
}

Response:

{
  "logoutUrl": "https://login.microsoftonline.com/{tenant}/oauth2/v2.0/logout?post_logout_redirect_uri=..."
}

Jobs API

`POST /api/jobs`

Create new processing jobs from uploaded files.

Headers:

Authorization: Bearer <token>
Content-Type: multipart/form-data

Request Body:

file_0, file_1, ... : File uploads
modelConfig (optional): JSON string with model configuration

Model Config Structure:

{
  "primaryModels": ["openai-gpt5", "anthropic-sonnet4", "google-gemini25"],
  "consolidationModel": "openai-gpt5",
  "minimumSuccessThreshold": 1
}

Response:

{
  "jobs": [
    {
      "id": "4614818d-38c6-4eac-aa39-659c89d90836",
      "fileName": "brief.pdf",
      "fileSize": 1048576,
      "createdAt": "2025-10-07T17:40:00.000Z",
      "updatedAt": "2025-10-07T17:40:00.000Z",
      "userId": "38abcbd2-7558-4f64-aec2-fafc7807552c",
      "phase": "QUEUED",
      "progressPct": 0,
      "stepLabel": "Queued for processing",
      "providerUpdates": {},
      "error": null,
      "resultCsvUrl": null,
      "summary": null,
      "logs": [],
      "modelConfig": {
        "primaryModels": ["openai-gpt5", "anthropic-sonnet4", "google-gemini25"],
        "consolidationModel": "openai-gpt5",
        "minimumSuccessThreshold": 1
      }
    }
  ],
  "errors": []
}

Error Responses:

// No files
{
  "error": "no_files",
  "message": "No files provided for upload"
}

// Invalid model config
{
  "error": "invalid_model_config",
  "message": "Invalid model configuration: ..."
}

// File too large
{
  "error": "file_too_large",
  "message": "File size exceeds 200MB limit"
}

`GET /api/jobs`

List jobs for the current authenticated user.

Headers: Authorization: Bearer <token>

Query Parameters:

limit (optional): Max results (default: 50, max: 100)
offset (optional): Skip count for pagination (default: 0)
status (optional): Filter by phase (e.g., "COMPLETED")

Response:

{
  "jobs": [Job, Job, ...],
  "pagination": {
    "limit": 50,
    "offset": 0,
    "count": 15
  }
}

`GET /api/jobs/{job_id}`

Get detailed information for a specific job.

Headers: Authorization: Bearer <token>

Response: Single Job object (same structure as POST /api/jobs)

Error:

{
  "error": "not_found",
  "message": "Job not found or access denied"
}

`DELETE /api/jobs/{job_id}`

Delete a job and all associated files.

Headers: Authorization: Bearer <token>

Response:

{
  "message": "Job deleted successfully"
}

`GET /api/jobs/{job_id}/download`

Download the CSV result file for a completed job.

Headers: Authorization: Bearer <token>

Response:

Content-Type: text/csv; charset=utf-8
Content-Disposition: attachment; filename="brief-20251007174508.csv"
Body: CSV file content

Error (Job Not Complete):

{
  "error": "not_ready",
  "message": "Job has not completed processing yet"
}

`POST /api/jobs/batch-download`

Download multiple CSV files as a ZIP archive.

Headers: Authorization: Bearer <token>

Request:

{
  "jobIds": ["uuid1", "uuid2", "uuid3"]
}

Response:

Content-Type: application/zip
Content-Disposition: attachment; filename="brief-extractor-results-{timestamp}.zip"
Body: ZIP file containing CSV files

`GET /api/jobs/{job_id}/logs`

Get processing logs for a specific job.

Headers: Authorization: Bearer <token>

Query Parameters:

limit (optional): Max log entries (default: 100)
level (optional): Filter by level (DEBUG, INFO, WARNING, ERROR)

Response:

{
  "logs": [
    {
      "timestamp": "2025-10-07T17:42:46.474Z",
      "level": "INFO",
      "message": "Starting consolidation with 2 model results using openai-gpt5"
    }
  ],
  "count": 150
}

`GET /api/jobs/stats`

Get job processing statistics for the current user.

Headers: Authorization: Bearer <token>

Response:

{
  "stats": {
    "total": 25,
    "completed": 20,
    "failed": 2,
    "processing": 3,
    "queued": 0,
    "totalAssetsExtracted": 487,
    "totalCostUsd": 5.67,
    "totalProcessingTime": 3600.5,
    "averageAssetsPerJob": 24.35,
    "averageCostPerJob": 0.283
  }
}

Configuration API

`GET /api/config/models`

List all available AI models with pricing and capabilities.

Response:

{
  "models": [
    {
      "key": "openai-gpt5",
      "name": "GPT-5",
      "provider": "OpenAI",
      "description": "Latest OpenAI model with advanced reasoning capabilities",
      "costPer1mInput": 2.50,
      "costPer1mOutput": 10.00,
      "canBePrimary": true,
      "canBeConsolidation": true
    },
    {
      "key": "anthropic-sonnet4",
      "name": "Claude Sonnet 4",
      "provider": "Anthropic",
      "description": "Balanced performance and cost",
      "costPer1mInput": 3.00,
      "costPer1mOutput": 15.00,
      "canBePrimary": true,
      "canBeConsolidation": true
    }
  ]
}

`GET /api/config/defaults`

Get default model configuration.

Response:

{
  "config": {
    "primaryModels": ["openai-gpt5", "anthropic-sonnet4", "google-gemini25"],
    "consolidationModel": "openai-gpt5",
    "minimumSuccessThreshold": 1
  }
}

`POST /api/config/estimate`

Estimate processing cost before uploading.

Request:

{
  "modelConfig": {
    "primaryModels": ["openai-gpt5", "anthropic-sonnet4"],
    "consolidationModel": "anthropic-opus4"
  },
  "fileSizeBytes": 1048576,
  "estimatedTokens": 10000
}

Response:

{
  "estimatedCostUsd": 0.45,
  "breakdown": {
    "openai-gpt5": 0.15,
    "anthropic-sonnet4": 0.12,
    "anthropic-opus4": 0.18
  },
  "estimatedTokens": {
    "input": 8000,
    "output": 6000,
    "total": 14000
  },
  "estimatedTime": "90-180 seconds"
}

`POST /api/config/validate`

Validate model configuration before submission.

Request:

{
  "modelConfig": {
    "primaryModels": ["invalid-model"],
    "consolidationModel": "openai-gpt5"
  }
}

Response:

{
  "valid": false,
  "errors": [
    "Primary model 'invalid-model' is not available"
  ],
  "warnings": [
    "Using only 1 primary model - consider using 2-3 for better accuracy"
  ],
  "modelCount": {
    "primary": 1,
    "consolidation": 1,
    "total": 2
  }
}

Data Models and Schemas

Job Data Model

@dataclass
class Job:
    id: str                                    # UUID
    file_name: str                             # Original filename
    file_size: int                             # Bytes
    created_at: datetime                       # UTC timestamp
    updated_at: datetime                       # UTC timestamp
    user_id: str                              # Azure AD user OID
    upload_path: str                          # Disk path
    output_path: Optional[str]                # CSV path (when complete)
    phase: JobPhase                           # Current processing phase
    progress_pct: int                         # 0-100
    step_label: str                           # Human-readable step
    provider_updates: Dict[str, ProviderUpdate]  # Per-model status
    error: Optional[str]                      # Error message if failed
    result_csv_url: Optional[str]             # Download endpoint
    summary: Optional[JobSummary]             # Completion summary
    logs: List[LogEntry]                      # Processing logs
    model_config: ModelConfiguration          # AI model settings

Job Phases

stateDiagram-v2
    [*] --> QUEUED: Job created
    QUEUED --> EXTRACT_CONTENT: Worker picks up
    EXTRACT_CONTENT --> LLM_ANALYSIS: Content extracted
    LLM_ANALYSIS --> CONSOLIDATION: Analysis complete
    CONSOLIDATION --> CSV_GENERATION: Results consolidated
    CSV_GENERATION --> COMPLETED: CSV written

    EXTRACT_CONTENT --> FAILED: Extraction error
    LLM_ANALYSIS --> FAILED: All models failed
    CONSOLIDATION --> FAILED: Consolidation error
    CSV_GENERATION --> FAILED: Write error

    COMPLETED --> [*]
    FAILED --> [*]

    note right of QUEUED
        Progress: 0%
        Waiting for worker
    end note

    note right of EXTRACT_CONTENT
        Progress: 10-25%
        LlamaParser API
    end note

    note right of LLM_ANALYSIS
        Progress: 25-75%
        Parallel model execution
    end note

    note right of CONSOLIDATION
        Progress: 75-90%
        Single model merging
    end note

    note right of CSV_GENERATION
        Progress: 90-100%
        File write
    end note

Provider Update Model

@dataclass
class ProviderUpdate:
    provider: str                        # 'openai', 'anthropic', 'google'
    model: str                           # 'gpt-5', 'claude-sonnet-4', etc.
    status: str                          # 'started', 'success', 'error'
    started_at: Optional[str]            # ISO timestamp
    completed_at: Optional[str]          # ISO timestamp
    latency_ms: Optional[float]          # Processing duration
    tokens_in: Optional[int]             # Input tokens
    tokens_out: Optional[int]            # Output tokens
    tokens_cached: Optional[int]         # Cached tokens (cost reduction)
    cost_usd: Optional[float]            # Estimated cost
    error: Optional[str]                 # Error message if failed

Base Deliverable Schema

Purpose: Intermediate format with multiplier arrays (before expansion)

class BaseDeliverable(BaseModel):
    # Metadata (String Fields)
    title: str                                    # Required
    status: Optional[str] = ""                   # "Draft", "In Progress", "Final"
    category: Optional[str] = ""                 # "Paid Social - Meta Feed"
    media: Optional[str] = ""                    # "IMAGE", "VIDEO", "COPY"
    asset_type: Optional[str] = ""               # "JPG", "PNG", "MP4"
    brand_identifier: Optional[str] = ""         # "adidas TERREX"

    # Multiplier Arrays (Expansion Fields)
    technical_specifications: Optional[List[str]] = []  # ["1080x1080", "1920x1080"]
    language_country_market: Optional[List[str]] = []   # ["EN-UK", "DE-DE", "IT-IT"]

    # Dates and References (String Fields)
    review_date: Optional[str] = ""              # "2024-08-08"
    live_date: Optional[str] = ""                # "08/08"
    end_date: Optional[str] = ""                 # "2025-12-31"
    reference_material: Optional[str] = ""       # URLs or notes

    # Metadata (String Fields)
    quantity: Optional[str] = "1"                # For validation
    page_number: Optional[str] = ""              # "3-4"
    priority_level: Optional[str] = ""           # "High"
    creative_direction: Optional[str] = ""       # Design requirements

Multiplier Expansion Example:

graph LR
    Base["Base Deliverable<br/><br/>Title: Hero Slider<br/>Specs: [750x1200, 1920x853]<br/>Markets: [IT-IT]<br/>Quantity: 2"]

    subgraph Expansion["Cartesian Product Expansion"]
        Combo1["750x1200 × IT-IT"]
        Combo2["1920x853 × IT-IT"]
    end

    Asset1["Asset 1<br/>Hero Slider (750x1200, IT-IT)<br/>Quantity: 1"]
    Asset2["Asset 2<br/>Hero Slider (1920x853, IT-IT)<br/>Quantity: 1"]

    Base --> Expansion
    Combo1 --> Asset1
    Combo2 --> Asset2

    style Base fill:#fff3e0
    style Expansion fill:#e3f2fd
    style Asset1 fill:#e8f5e9
    style Asset2 fill:#e8f5e9

Marketing Asset Schema

Purpose: Final individual assets for CSV export (after expansion)

class MarketingAsset(BaseModel):
    # All fields become strings (arrays expanded into individual assets)
    title: str                              # "Hero Slider (750x1200, IT-IT)"
    status: Optional[str] = ""
    category: Optional[str] = ""
    media: Optional[str] = ""
    asset_type: Optional[str] = ""
    brand_identifier: Optional[str] = ""
    technical_specifications: Optional[str] = ""    # Single value: "750x1200"
    review_date: Optional[str] = ""
    live_date: Optional[str] = ""
    end_date: Optional[str] = ""
    reference_material: Optional[str] = ""
    language_country_market: Optional[str] = ""     # Single value: "IT-IT"
    quantity: Optional[str] = "1"                   # Always "1" for individuals
    page_number: Optional[str] = ""
    priority_level: Optional[str] = ""
    creative_direction: Optional[str] = ""

CSV Output Format (16 Columns)

title,category,media,asset_type,technical_specifications,language_country_market,quantity,brand_identifier,review_date,live_date,end_date,reference_material,page_number,priority_level,creative_direction,status
"Hero Slider (750x1200, IT-IT)","Wholesale - Hero Slider","IMAGE","JPG","750x1200","IT-IT","1","adidas TERREX","2024-08-08","08/08","","https://drive.google.com/...","3-4","High","Adapt as per layouts...","Draft"

Configuration Management

Environment Variables

Backend Configuration (.env in project root):

# =============================================================================
# API KEYS (Required)
# =============================================================================
OPENAI_API_KEY=sk-...                    # OpenAI GPT-5 access
ANTHROPIC_API_KEY=sk-ant-api03-...       # Anthropic Claude access
GOOGLE_API_KEY=AIzaSy...                 # Google Gemini access
LLAMACLOUD_API_KEY=llx-...               # LlamaParser cloud service

# =============================================================================
# OPENAI CONFIGURATION
# =============================================================================
OPENAI_MODEL=gpt-5
OPENAI_REASONING_EFFORT=medium           # high, medium, low, minimal
OPENAI_TIMEOUT=3600                      # 1 hour (for long documents)
OPENAI_MAX_RETRIES=2

# =============================================================================
# ANTHROPIC CONFIGURATION
# =============================================================================
ANTHROPIC_MODEL_OPUS=claude-opus-4-1-20250805
ANTHROPIC_MODEL_SONNET=claude-sonnet-4-20250514
ANTHROPIC_TEMPERATURE=1                  # Higher for creative analysis
ANTHROPIC_MAX_TOKENS=32000              # Sonnet limit (Opus: 64000)
ANTHROPIC_THINKING_BUDGET=12000         # Thinking tokens
ANTHROPIC_TIMEOUT=300                    # 5 minutes

# =============================================================================
# GOOGLE CONFIGURATION
# =============================================================================
GOOGLE_MODEL=gemini-2.5-pro
GOOGLE_TEMPERATURE=0.7
GOOGLE_MAX_OUTPUT_TOKENS=100000
GOOGLE_THINKING_BUDGET=12000
GOOGLE_TIMEOUT=3600

# =============================================================================
# PROCESSING CONFIGURATION
# =============================================================================
DEFAULT_PRIMARY_MODELS=openai-gpt5,anthropic-sonnet4,google-gemini25
DEFAULT_CONSOLIDATION_MODEL=openai-gpt5
MINIMUM_SUCCESS_THRESHOLD=1              # Min models that must succeed
ENABLE_COST_ESTIMATION=true
MAX_PROCESSING_COST_USD=10.00

# =============================================================================
# MSAL AUTHENTICATION (Azure AD)
# =============================================================================
MSAL_CLIENT_ID=9079054c-9620-4757-a256-23413042f1ef
MSAL_CLIENT_SECRET=placeholder           # Not used for PKCE flow
MSAL_TENANT_ID=e519c2e6-bc6d-4fdf-8d9c-923c2f002385
MSAL_REDIRECT_URI=https://ai-sandbox.oliver.solutions/brief-extractor/
MSAL_AUTHORITY=https://login.microsoftonline.com/e519c2e6-bc6d-4fdf-8d9c-923c2f002385

# =============================================================================
# SECURITY AND RUNTIME
# =============================================================================
DEV_MODE=false                           # MUST be false in production!
ALLOWED_ORIGINS=https://ai-sandbox.oliver.solutions
SESSION_SECRET=<random-secret-here>
SECURE_COOKIES=true                      # true for HTTPS
HTTPS_ONLY=true                          # true for production

# =============================================================================
# JOB PROCESSING
# =============================================================================
MAX_CONCURRENT_JOBS=5                    # Parallel job processing limit
MAX_UPLOAD_SIZE_MB=200                   # Per-file upload limit
FILE_RETENTION_HOURS=24                  # Auto-cleanup threshold
WS_PING_INTERVAL_SECONDS=30             # WebSocket heartbeat

# =============================================================================
# SERVER CONFIGURATION
# =============================================================================
SERVER_HOST=0.0.0.0
SERVER_PORT=8002
SERVER_WORKERS=2                         # Hypercorn workers (has no effect with serve)

Frontend Configuration (frontend/.env):

# Backend API and WebSocket URLs (embedded at build time)

# Production
VITE_API_URL=https://ai-sandbox.oliver.solutions/brief-extractor-back/api
VITE_WS_URL=wss://ai-sandbox.oliver.solutions/brief-extractor-back

# Local Development (comment out production, uncomment below)
# VITE_API_URL=http://localhost:8000/api
# VITE_WS_URL=ws://localhost:8000

Build Configuration (frontend/vite.config.ts):

export default defineConfig({
  base: '/brief-extractor/',              // Deployment path prefix
  plugins: [react()],
  resolve: {
    alias: {
      '@': path.resolve(__dirname, './src')  // Import alias
    }
  },
  server: {
    port: 3000,
    proxy: {                               // Dev server proxying
      '/api': {
        target: 'http://localhost:8000',
        changeOrigin: true
      },
      '/ws': {
        target: 'ws://localhost:8000',
        ws: true
      }
    }
  }
})

Configuration Loading Priority

Backend:

Environment variables from .env file
Default values in core/config.py and server/config_runtime.py
Runtime overrides (future feature)

Frontend:

Build-time environment variables (VITE_*)
Fallback defaults in code (e.g., /api for VITE_API_URL)

Deployment Architecture

Production Deployment Topology

graph TB
    Internet["Internet<br/>(HTTPS/WSS)"]

    subgraph Server["Production Server: ai-sandbox.oliver.solutions"]
        Apache["Apache Web Server<br/>Port 443<br/>- SSL/TLS Termination<br/>- Virtual Host<br/>- ProxyPass WebSocket<br/>- Serve static files"]

        subgraph Static["Static Files"]
            Frontend["/brief-extractor/<br/>/var/www/html/brief-extractor/dist/"]
        end

        subgraph Backend["Quart Application"]
            Hypercorn["Hypercorn ASGI Server<br/>Port 8002<br/>Systemd service"]
            Workers["5 Async Job Processors"]
            WSSupport["WebSocket Support"]
            Storage["File Storage<br/>/server/data/"]
        end
    end

    subgraph External["External APIs"]
        OpenAI["OpenAI API"]
        Anthropic["Anthropic API"]
        Google["Google AI API"]
        Llama["LlamaCloud API"]
        AzureAD["Azure AD<br/>Authentication"]
    end

    Internet -->|"HTTPS"| Apache
    Apache -->|"Proxy<br/>/brief-extractor-back/"| Hypercorn
    Apache -.->|"Serve"| Frontend
    Hypercorn --> Workers
    Hypercorn --> WSSupport
    Workers --> Storage
    Workers --> OpenAI
    Workers --> Anthropic
    Workers --> Google
    Workers --> Llama
    Frontend --> AzureAD

    style Apache fill:#f9f9f9
    style Hypercorn fill:#e3f2fd
    style Workers fill:#e8f5e9
    style Frontend fill:#fff3e0

Apache Configuration

# Brief Extractor - WebSocket and HTTP proxy
ProxyPass /brief-extractor-back/ws ws://localhost:8002/ws
ProxyPass /brief-extractor-back/ http://localhost:8002/
ProxyPassReverse /brief-extractor-back/ http://localhost:8002/

# Static frontend files
Alias /brief-extractor /var/www/html/brief-extractor/dist
<Directory /var/www/html/brief-extractor/dist>
    Options -Indexes +FollowSymLinks
    AllowOverride None
    Require all granted

    # SPA routing support
    RewriteEngine On
    RewriteBase /brief-extractor/
    RewriteRule ^index\.html$ - [L]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /brief-extractor/index.html [L]
</Directory>

# Required Apache modules
# sudo a2enmod proxy proxy_http proxy_wstunnel rewrite

Systemd Service Configuration

Service File: /etc/systemd/system/brief-extractor.service

[Unit]
Description=Brief Extractor Backend Service
After=network.target

[Service]
Type=simple
User=www-data
WorkingDirectory=/var/www/html/brief-extractor/backend
Environment="PATH=/var/www/html/brief-extractor/backend/venv/bin:/usr/bin"
ExecStart=/var/www/html/brief-extractor/backend/venv/bin/python -m server.app
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

Service Management:

# Start service
sudo systemctl start brief-extractor

# Enable on boot
sudo systemctl enable brief-extractor

# View logs
sudo journalctl -u brief-extractor -f

# Restart after code update
sudo systemctl restart brief-extractor

Build and Deployment Process

Backend Deployment:

# 1. Update code on server
cd /var/www/html/brief-extractor/backend
git pull origin main

# 2. Activate virtual environment
source venv/bin/activate

# 3. Install/update dependencies
pip install -r requirements_enhanced.txt

# 4. Update .env file (if needed)
nano .env

# 5. Restart service
sudo systemctl restart brief-extractor

# 6. Verify deployment
curl https://ai-sandbox.oliver.solutions/brief-extractor-back/health

Frontend Deployment:

# 1. On development machine, configure production URLs
cd frontend
nano .env  # Ensure VITE_API_URL points to production

# 2. Build for production
npm run build

# 3. Deploy to server
scp -r dist/* user@server:/var/www/html/brief-extractor/dist/

# 4. Verify deployment
# Visit: https://ai-sandbox.oliver.solutions/brief-extractor/

Environment-Specific Builds:

# Build for production server
VITE_API_URL=https://ai-sandbox.oliver.solutions/brief-extractor-back/api \
VITE_WS_URL=wss://ai-sandbox.oliver.solutions/brief-extractor-back \
npm run build

# Build for local development
VITE_API_URL=http://localhost:8000/api \
VITE_WS_URL=ws://localhost:8000 \
npm run build

Error Handling and Logging

Backend Logging Strategy

Structured Logging (Structlog):

import structlog

structlog.configure(
    processors=[
        structlog.stdlib.filter_by_level,
        structlog.stdlib.add_logger_name,
        structlog.stdlib.add_log_level,
        structlog.processors.TimeStamper(fmt="ISO"),
        structlog.processors.JSONRenderer()  # JSON for production
    ]
)

logger = structlog.get_logger(__name__)
logger.info("Event occurred", key="value", user_id=user_id)

Output Format:

{
  "event": "Job processing completed",
  "logger": "server.runners.job_runner",
  "level": "info",
  "timestamp": "2025-10-07T17:45:08.132Z",
  "job_id": "uuid",
  "assets_extracted": 19,
  "cost_usd": 0.2759
}

Log Levels:

DEBUG: Detailed diagnostic information (disabled in production)
INFO: General informational messages (job lifecycle, API calls)
WARNING: Warning conditions (token validation issues, model failures)
ERROR: Error events (job failures, API errors, exceptions)

Key Logging Points:

Job Lifecycle:

logger.info(f"Created job {job.id} for file {file_name} (user: {user_id})")
logger.info(f"Processing job {job_id}: {job.file_name}")
logger.info(f"Job {job_id} completed successfully: {assets} assets, ${cost}, {time}s")
logger.error(f"Job {job_id} failed: {error}", exc_info=True)

AI Model Calls:

# Standard success logging
logger.info(f"[INITIAL] Structured output validated: 9 assets")

# Verbose error logging (only when problems occur)
logger.error(f"[CONSOLIDATION] ========== MISSING 'assets' KEY ==========")
logger.error(f"[CONSOLIDATION] Full raw content: {response.content}")
logger.error(f"[CONSOLIDATION] Debug file saved: /tmp/consolidation_debug_*.json")

WebSocket Events:

logger.info(f"Registered WebSocket client {client_id} for user {user_id}")
logger.warning(f"WebSocket connection rejected - no valid authentication")
logger.debug(f"Broadcast message to {sent_count} clients for user {user_id}")

Error Handling Patterns

API Error Responses:

# Validation Error (400)
return jsonify({
    'error': 'invalid_request',
    'message': 'Specific validation error details'
}), 400

# Authentication Error (401)
return jsonify({
    'error': 'unauthorized',
    'message': 'Valid authentication required'
}), 401

# Not Found (404)
return jsonify({
    'error': 'not_found',
    'message': 'Job not found or access denied'
}), 404

# Internal Error (500)
return jsonify({
    'error': 'server_error',
    'message': 'Internal server error'
}), 500

Job Processing Errors:

try:
    # Processing stages...
    result = await analyzer.process_document_multi_model(...)

except Exception as e:
    # Capture and report error
    error_msg = f"Job processing failed: {str(e)}"
    logger.error(f"Job {job.id} failed: {error_msg}", exc_info=True)

    # Update job state
    await progress.emit_failure(error_msg)
    job.mark_failed(error_msg)

    # Broadcast via WebSocket
    await ws_manager.broadcast_job_update(job.id, {
        'type': 'job.failed',
        'jobId': job.id,
        'error': error_msg
    })

    return False  # Job failed

Partial Failure Handling:

# LLM Analysis - allow partial success
responses, metadata = await provider_manager.execute_parallel_analysis(
    models=['openai-gpt5', 'anthropic-sonnet4', 'google-gemini25'],
    minimum_success_threshold=1  # At least 1 must succeed
)

# If 2 out of 3 models fail, processing continues with 1 result
# Consolidation still occurs, just with less data diversity

Debug Artifact Generation

When Consolidation Fails:

# Automatic debug file creation
debug_file = f"/tmp/consolidation_debug_{timestamp}.json"

{
  "timestamp": "20251007_174500",
  "consolidation_model": "gpt-5",
  "raw_content": "{}",  # Empty response
  "parsed_data": {},
  "primary_analysis_results": [
    {
      "provider": "anthropic",
      "model": "claude-sonnet-4",
      "success": true,
      "deliverable_count": 9
    }
  ],
  "token_usage": {...}
}

Location: /tmp/ directory on server Purpose: Post-mortem analysis of API responses Includes: Full request context, model outputs, token stats

Performance and Scalability

Performance Characteristics

Processing Times (Typical 10-page Brief):

Content Extraction: 10-30 seconds (LlamaParser)
Parallel Analysis: 50-120 seconds (limited by slowest model)
- GPT-5: 90-110 seconds
- Claude Sonnet: 60-80 seconds
- Gemini: 40-50 seconds
Consolidation: 60-90 seconds (single model)
CSV Generation: <1 second
Total: 2-4 minutes end-to-end

Token Usage (Typical):

Input: 8,000-12,000 tokens (document + prompt)
Output: 2,000-6,000 tokens per model
Total: 30,000-50,000 tokens across all models

Cost (Typical):

3-Model Analysis: $0.20-$0.40
With Premium Consolidation (Opus): +$0.15-$0.25
Average Per Document: $0.25-$0.45

Scalability Analysis

Concurrent Processing:

MAX_CONCURRENT_JOBS=5 → 5 documents processed simultaneously
Each job uses 3 primary models + 1 consolidation = 4 LLM calls

Maximum concurrent LLM calls: 5 jobs × 3 models = 15 parallel API calls
(Consolidation sequential within each job)

Memory Footprint:

Per Job:
- Uploaded file: ~1-50 MB (in memory during upload, then on disk)
- Extracted content: ~50-500 KB (markdown)
- LLM responses: ~10-50 KB each × 4 models = 40-200 KB
- Job metadata: ~5-10 KB
Total per job: ~1-50 MB (mostly file content)

With 5 concurrent jobs: ~5-250 MB total

Network Bandwidth:

Upload: User → Server
- Per document: 1-50 MB
- Rate limiting: None (could add via nginx)

LLM API Calls: Server → AI Providers
- Per request: 10-100 KB (prompts)
- Per response: 5-50 KB (structured JSON)
- Concurrent: 15 simultaneous connections

Download: Server → User
- Per CSV: 5-500 KB (typically < 100 KB)
- Batch ZIP: Up to 10 MB for large batches

Bottlenecks and Optimizations

Current Bottlenecks:

LLM API Latency: 50-120 seconds (external dependency)
- Mitigation: Parallel execution reduces total time
- Future: Caching for similar documents
File Upload Speed: Network dependent
- Mitigation: Chunked upload (future)
- Compression: Could reduce bandwidth 50-70%
Concurrent Job Limit: MAX_CONCURRENT_JOBS=5
- Rationale: Cost control, API rate limits
- Tunable: Can increase to 10-20 with monitoring

Optimization Strategies:

1. Token Caching (Prompt Caching):

# OpenAI and Anthropic support prompt caching
# Repeated analysis of similar documents reuses cached context
# Savings: 50-90% on input token costs

2. Result Caching (Future):

# Cache analysis results by document hash
# If same file uploaded again, return cached result
# Savings: 100% cost reduction for duplicates

3. Async Everything:

# File I/O: run_in_executor() for blocking operations
# Database: Currently in-memory (future: async DB driver)
# API calls: Native async clients (AsyncOpenAI, AsyncAnthropic)

4. Smart Model Selection:

# Cost-optimized: GPT-5 + Gemini (cheapest)
# Quality-optimized: All 3 models
# Speed-optimized: Sonnet + Gemini (fastest)

Monitoring and Observability

Health Check Endpoint:

curl https://ai-sandbox.oliver.solutions/brief-extractor-back/health

{
  "status": "healthy",
  "timestamp": "2025-10-07T17:45:00.000Z",
  "queue": {
    "pending": 2,
    "active": 3
  },
  "websockets": {
    "total_connections": 5,
    "unique_users": 3
  },
  "config": {
    "devMode": false,
    "maxConcurrentJobs": 5,
    "maxUploadSize": "200MB"
  }
}

Metrics to Monitor:

Queue depth (queue.pending)
Active jobs (queue.active)
WebSocket connection count
Average processing time per job
Error rate (failed jobs / total jobs)
API response times
Token usage and costs

Logging Integration:

# Systemd journal
sudo journalctl -u brief-extractor -f --since "1 hour ago"

# Filter by log level
sudo journalctl -u brief-extractor -p err

# Filter by job ID
sudo journalctl -u brief-extractor | grep "job_id=uuid"

Development Guide

Local Development Setup

Prerequisites:

Python 3.13+ with virtual environment support
Node.js 18+ with npm
Git for version control
Azure AD app registration (for auth testing)

Backend Setup:

# 1. Clone repository
git clone <repo_url>
cd adi-o3-multipass

# 2. Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate  # Mac/Linux
# or: venv\Scripts\activate  # Windows

# 3. Install dependencies
pip install -r requirements_enhanced.txt

# 4. Configure environment
cp .env.example .env
nano .env  # Add API keys, set DEV_MODE=true

# 5. Verify configuration
python -c "from core.config import config; print(config.validate_api_keys())"

# 6. Run development server
python -m server.app
# Server starts on http://0.0.0.0:8000

Frontend Setup:

# 1. Navigate to frontend
cd frontend

# 2. Install dependencies
npm install

# 3. Configure environment
cp .env.example .env
nano .env  # Set local development URLs

# Example .env for local dev:
# VITE_API_URL=http://localhost:8000/api
# VITE_WS_URL=ws://localhost:8000

# 4. Start development server
npm run dev
# Server starts on http://localhost:3000

# 5. Open browser
open http://localhost:3000

Development Workflow

Typical Development Session:

# Terminal 1: Backend
cd adi-o3-multipass
source venv/bin/activate
python -m server.app

# Terminal 2: Frontend
cd adi-o3-multipass/frontend
npm run dev

# Terminal 3: Logs
sudo journalctl -u brief-extractor -f
# or: tail -f server/processing.log

Hot Reload:

Frontend: Vite HMR (instant updates on save)
Backend: Manual restart required (no auto-reload in production mode)
- Development: Set DEBUG=true for auto-reload (not recommended for async code)

Testing

Backend Unit Tests (Future):

# tests/test_job_manager.py
async def test_create_job():
    manager = JobManager.get_instance()
    job = await manager.create_job(
        file_name="test.pdf",
        file_size=1024,
        file_data=b"...",
        user_id="test-user"
    )
    assert job.phase == JobPhase.QUEUED

Frontend Testing:

# Type checking
npm run type-check

# Linting
npm run lint

# Unit tests (future)
npm run test

Integration Testing:

# Test full pipeline with sample document
python core/process_brief_enhanced.py examples/sample_brief.pdf \
  --primary-models openai-gpt5,anthropic-sonnet4 \
  --consolidation-model anthropic-opus4

Debugging Tips

Backend Debugging:

# Add breakpoint
import pdb; pdb.set_trace()

# Enhanced logging for specific job
logger = logging.getLogger(f"job.{job_id}")
logger.setLevel(logging.DEBUG)

# Inspect job state
job = await job_manager.get_job(job_id)
print(f"Phase: {job.phase}, Progress: {job.progress_pct}%")
print(f"Providers: {job.provider_updates}")

Frontend Debugging:

// Access store from console
import { useJobStore } from '@/store/jobStore'
const jobs = useJobStore.getState().jobs
console.log(jobs)

// WebSocket debugging
import { wsClient } from '@/services/websocket'
console.log(wsClient.isConnected())
console.log(wsClient.getConnectionState())

// Force reconnect
wsClient.forceReconnect()

Common Debug Scenarios:

Jobs not appearing in queue:

# Check WebSocket connection
# Frontend console: wsClient.getConnectionState()
# Backend logs: grep "WebSocket" /var/log/

# Check user isolation
# Backend: Verify user_id matches between job creation and WebSocket
# Log: "Created job ... (user: {user_id})"
# Log: "WebSocket authenticated ... for user: {user_id}"

Consolidation returning empty:

# Check debug files
ls /tmp/consolidation_debug_*.json
cat /tmp/consolidation_debug_20251007_174500.json

# Check OpenAI library version
pip show openai
# Should be >= 1.0.0 for responses.parse() support

Troubleshooting Guide

Common Issues and Resolutions

Issue: "Development Mode" banner shows in production

Symptoms:

Login page shows yellow "Development Mode" banner
Authentication bypassed

Root Cause: Backend DEV_MODE=true in .env

Resolution:

# Edit backend .env file
nano .env
# Change: DEV_MODE=false

# Restart backend
sudo systemctl restart brief-extractor

# Verify
curl https://ai-sandbox.oliver.solutions/brief-extractor-back/api/auth/config
# Should return: "devMode": false

Issue: WebSocket connect/disconnect loop

Symptoms:

Frontend logs show rapid connect/disconnect
Backend logs: 'str' object has no attribute 'value'

Root Cause: Job phase serialization issue when phase is string instead of enum

Resolution: Ensure server/jobs/models.py has defensive phase handling:

def to_dict(self):
    phase_value = self.phase.value if isinstance(self.phase, JobPhase) else self.phase
    return {'phase': phase_value, ...}

Issue: Jobs don't appear in queue after upload

Symptoms:

Upload succeeds
Must refresh page to see job

Root Cause: WebSocket user ID mismatch (session ID vs real user ID)

Resolution:

Ensure WebSocket authenticates with query parameter: ?token=<jwt>
Backend extracts user ID from token: websocket.args.get('token')
WebSocket client registered with real user ID, not session ID

Issue: GPT-5 consolidation returns empty object

Symptoms:

Backend logs: Missing 'assets' key in consolidated response
Consolidation phase fails

Root Cause: Outdated OpenAI library

Resolution:

# Upgrade OpenAI library
pip install --upgrade openai

# Verify version
pip show openai
# Should be >= 1.0.0

# Restart backend
sudo systemctl restart brief-extractor

Issue: 404 errors for assets (index-.js, index-.css)

Symptoms:

Browser console: Loading module blocked - disallowed MIME type
Assets return HTML (404 page) instead of JS/CSS

Root Cause: Incorrect base path in Vite config

Resolution:

// frontend/vite.config.ts
export default defineConfig({
  base: '/brief-extractor/',  // Must match deployment path
  ...
})

// Rebuild
npm run build

// Verify built index.html
cat dist/index.html
// Should show: src="/brief-extractor/assets/index-*.js"

Issue: WebSocket 400 Bad Request

Symptoms:

Backend logs: GET /ws 1.1" 400
WebSocket never establishes

Root Cause: Apache not configured for WebSocket upgrade

Resolution:

# Add to Apache config
ProxyPass /brief-extractor-back/ws ws://localhost:8002/ws
ProxyPass /brief-extractor-back/ http://localhost:8002/
ProxyPassReverse /brief-extractor-back/ http://localhost:8002/

# Enable required modules
sudo a2enmod proxy proxy_http proxy_wstunnel rewrite

# Reload Apache
sudo systemctl reload apache2

Issue: CORS errors in browser

Symptoms:

Browser console: CORS policy: No 'Access-Control-Allow-Origin' header
API calls fail

Root Cause: ALLOWED_ORIGINS misconfigured

Resolution:

# Edit backend .env
ALLOWED_ORIGINS=https://ai-sandbox.oliver.solutions

# NOT: https://ai-sandbox.oliver.solutions/brief-extractor
# (Don't include path, just domain)

# Restart backend
sudo systemctl restart brief-extractor

Issue: File upload fails with 413 Payload Too Large

Symptoms:

Large files rejected
Error: "File size exceeds XMB limit"

Root Cause: Upload size limit too small

Resolution:

# Backend .env
MAX_UPLOAD_SIZE_MB=200  # Increase if needed

# Also check Apache/Nginx limits
# Apache: LimitRequestBody 209715200  # 200MB in bytes
# Nginx: client_max_body_size 200M;

# Restart services
sudo systemctl restart brief-extractor apache2

Advanced Topics

Multi-Model Consolidation Algorithm

Normalization Phase:

# 1. Title Normalization
# Input: "1234 - Location A", "Store B - Hero Slider"
# Output: "Wholesale - Hero Slider (Campaign)"

# 2. Category Normalization
# Input: "Paid Social", "Social Media - Paid"
# Output: "Paid Social"

# 3. Specifications Normalization
# Input: ["1080 × 1080", "1080x1080 px"]
# Output: ["1080x1080"]

Deduplication Strategy:

# Build deduplication key
key = (
    normalized_title,
    normalized_category,
    media,
    tuple(sorted(technical_specifications)),
    asset_type
)

# Merge assets with same key
merged_asset = {
    **base_asset,
    'technical_specifications': union(specs1, specs2),
    'language_country_market': union(markets1, markets2),
    'quantity': max(qty1, qty2)
}

Quality Enhancement:

# For each field, choose best value from all models
reference_material = longest([model1.ref, model2.ref, model3.ref])
creative_direction = most_detailed([model1.dir, model2.dir, model3.dir])

Custom Prompt Engineering

Prompt Structure:

prompts/
├── system_multi_perspective.txt          # System message for analysis
├── multi_perspective_analysis.txt        # User prompt template
├── consolidation_analysis.txt            # Consolidation strategy
└── universal_schema.json                 # Output schema

Customization:

# Edit prompts
nano prompts/multi_perspective_analysis.txt

# Changes take effect immediately (loaded at runtime)
# No need to restart backend

Prompt Variables:

# multi_perspective_analysis.txt
# Uses: {doc_type}, {document_content}

# consolidation_analysis.txt
# Uses: {models_results}

Adding New AI Models

Step 1: Create Provider Class

# core/llm_service/new_provider.py
class NewProvider(BaseLLMProvider):
    async def generate_response(self, messages, schema, **kwargs):
        # Implementation
        pass

Step 2: Register in Provider Manager

# core/llm_service/provider_manager.py
elif provider_name == 'newprovider':
    return NewProvider(model_name=model_name)

Step 3: Add Configuration

# core/config.py
MODEL_MAPPINGS = {
    'newprovider-model1': ('newprovider', 'model-1'),
}

PRICING = {
    'newprovider-model1': {
        'input': 1.00,
        'output': 3.00
    }
}

Step 4: Update Frontend

// server/jobs/manager.py
model_info_map = {
    'newprovider-model1': ModelInfo(
        key='newprovider-model1',
        name='New Model',
        provider='NewProvider',
        ...
    )
}

Appendix

File Location Reference

Configuration Files:

Backend config: /adi-o3-multipass/.env
Frontend config: /adi-o3-multipass/frontend/.env
Server config: /adi-o3-multipass/server/config_runtime.py
Core config: /adi-o3-multipass/core/config.py
Vite config: /adi-o3-multipass/frontend/vite.config.ts

Prompt Templates:

System prompt: /adi-o3-multipass/prompts/system_multi_perspective.txt
Analysis prompt: /adi-o3-multipass/prompts/multi_perspective_analysis.txt
Consolidation prompt: /adi-o3-multipass/prompts/consolidation_analysis.txt
Schema: /adi-o3-multipass/prompts/universal_schema.json

Data Directories (Production):

Uploads: /var/www/html/brief-extractor/backend/server/data/uploads/
Outputs: /var/www/html/brief-extractor/backend/server/data/outputs/
Debug files: /tmp/consolidation_debug_*.json, /tmp/openai_debug_*.txt

Logs:

Systemd journal: journalctl -u brief-extractor
Application logs: Structured JSON to stdout (captured by systemd)

Port Reference

Service	Port	Protocol	Purpose
Frontend Dev Server	3000	HTTP	Local development
Backend Dev Server	8000	HTTP	Local development
Backend Production	8002	HTTP	Production (behind Apache)
Apache Web Server	443	HTTPS	Public-facing

Key URLs

Production:

Frontend: https://ai-sandbox.oliver.solutions/brief-extractor/
Backend API: https://ai-sandbox.oliver.solutions/brief-extractor-back/api
Backend WS: wss://ai-sandbox.oliver.solutions/brief-extractor-back/ws
Health Check: https://ai-sandbox.oliver.solutions/brief-extractor-back/health

Development:

Frontend: http://localhost:3000
Backend API: http://localhost:8000/api
Backend WS: ws://localhost:8000/ws
Health Check: http://localhost:8000/health

External Service Dependencies

Service	Purpose	API Key Variable	Endpoint
OpenAI	GPT-5 model access	`OPENAI_API_KEY`	`https://api.openai.com/v1/responses`
Anthropic	Claude models	`ANTHROPIC_API_KEY`	`https://api.anthropic.com/v1/messages`
Google AI	Gemini models	`GOOGLE_API_KEY`	`https://generativelanguage.googleapis.com`
LlamaCloud	Document parsing	`LLAMACLOUD_API_KEY`	`https://api.cloud.llamaindex.ai`
Microsoft Azure AD	Authentication	(MSAL config)	`https://login.microsoftonline.com`

Version Compatibility

Minimum Versions:

Python: 3.13+
Node.js: 18+
OpenAI library: 1.0.0+ (for responses API)
Anthropic library: 0.67.0+ (for async client)
Google GenAI: 0.4.0+ (for new SDK)

Browser Support:

Chrome/Edge: 90+
Firefox: 88+
Safari: 14+
WebSocket support required

Security Best Practices

Production Checklist

DEV_MODE=false in backend .env
SECURE_COOKIES=true in backend .env
HTTPS_ONLY=true in backend .env
Strong SESSION_SECRET (min 32 random chars)
ALLOWED_ORIGINS restricted to production domain
Apache/Nginx configured with SSL certificates
WebSocket proxy configured with SSL
File upload size limits enforced
CORS properly configured
API keys rotated regularly
Systemd service runs as restricted user (www-data)
File permissions: uploads/outputs not world-readable
Regular security updates (pip, npm)
Monitoring and alerting configured
Backup strategy for job data (if needed)

Data Privacy Considerations

User Data:

User emails/names from Azure AD (PII)
Uploaded documents may contain confidential business information
Generated CSVs contain extracted marketing data

Retention Policy:

Jobs auto-deleted after FILE_RETENTION_HOURS (24h default)
Uploaded files and CSVs deleted with job
Logs may contain user identifiers (consider log retention)

GDPR Compliance:

User data isolated per user_id
Automatic data deletion (24h retention)
Right to deletion: DELETE /api/jobs/{id}
Data portability: CSV download
Audit trail: Structured logs with user_id

Change Log

Version 2.0 (October 2025)

New Features:

Multi-tenant architecture with user isolation
Microsoft Azure AD authentication (MSAL)
Real-time WebSocket communication
Enhanced logging with conditional verbosity
Improved error handling and debug artifacts

Improvements:

PKCE authentication flow (more secure)
Parallel model execution (2-3x faster)
Smart WebSocket reconnection
Comprehensive API documentation

Bug Fixes:

GPT-5 consolidation empty response (OpenAI library upgrade)
WebSocket authentication with query parameters
Job phase serialization (enum vs string handling)
Frontend base path configuration for subpath deployment

Version 1.0 (September 2025)

Initial release with multi-model analysis
LlamaParser integration
Basic web interface
CLI support

Support and Contact

Documentation Updates: This document should be updated whenever:

Architecture changes are made
New features are added
Configuration options change
Deployment procedures are modified

Getting Help:

Review CLAUDE.md for project overview
Check troubleshooting guide for common issues
Review backend logs: journalctl -u brief-extractor
Review frontend console for client-side errors
Check debug artifacts in /tmp/ for detailed diagnostics

Document End

94 KiB Raw Permalink Blame History Unescape Escape

Brief Extractor - Comprehensive Technical Documentation v2.0

Table of Contents

Executive Summary

Key Features

Technology Stack

System Architecture Overview

High-Level Architecture

Component Communication Flow

Deployment Architecture

Backend Architecture

Technology Stack

Directory Structure

Core Components

1. Application Factory (server/app.py)

2. Job Manager (server/jobs/manager.py)

3. Storage Manager (server/jobs/storage.py)

4. Job Processing Pipeline (server/runners/job_runner.py)

5. LLM Service Layer (core/llm_service/)

6. Consolidation System (core/consolidation_processor.py)

7. Authentication System (server/auth/)

8. WebSocket Manager (server/ws/manager.py)

9. API Endpoints

Frontend Architecture

Technology Stack

Directory Structure

State Management Architecture

Authentication Store (store/authStore.ts)

Job Store (store/jobStore.ts)

Component Architecture

Dashboard Component (components/Dashboard.tsx)

Upload Panel (components/upload/UploadPanel.tsx)

Queue View (components/queue/QueueView.tsx)

WebSocket Client (services/websocket.ts)

Data Flow and Processing Pipeline

Complete Processing Flow

Parallel Processing Optimization

Authentication and Security

Microsoft Azure AD Integration

Security Mechanisms

Azure AD Configuration Requirements

WebSocket Real-Time Communication

Architecture

Message Protocol

Frontend WebSocket Implementation

Connection Resilience

API Reference

Authentication API

GET /api/auth/config

POST /api/auth/validate

GET /api/auth/user

POST /api/auth/logout

Jobs API

POST /api/jobs

GET /api/jobs

GET /api/jobs/{job_id}

DELETE /api/jobs/{job_id}

GET /api/jobs/{job_id}/download

POST /api/jobs/batch-download

GET /api/jobs/{job_id}/logs

GET /api/jobs/stats

Configuration API

GET /api/config/models

GET /api/config/defaults

POST /api/config/estimate

POST /api/config/validate

Data Models and Schemas

Job Data Model

Job Phases

Provider Update Model

Base Deliverable Schema

Marketing Asset Schema

CSV Output Format (16 Columns)

Configuration Management

Environment Variables

Configuration Loading Priority

Deployment Architecture

Production Deployment Topology

Apache Configuration

Systemd Service Configuration

94 KiB

Raw Permalink Blame History

1. Application Factory (`server/app.py`)

2. Job Manager (`server/jobs/manager.py`)

3. Storage Manager (`server/jobs/storage.py`)

4. Job Processing Pipeline (`server/runners/job_runner.py`)

5. LLM Service Layer (`core/llm_service/`)

6. Consolidation System (`core/consolidation_processor.py`)

7. Authentication System (`server/auth/`)

8. WebSocket Manager (`server/ws/manager.py`)

Authentication Store (`store/authStore.ts`)

Job Store (`store/jobStore.ts`)

Dashboard Component (`components/Dashboard.tsx`)

Upload Panel (`components/upload/UploadPanel.tsx`)

Queue View (`components/queue/QueueView.tsx`)

WebSocket Client (`services/websocket.ts`)

`GET /api/auth/config`

`POST /api/auth/validate`

`GET /api/auth/user`

`POST /api/auth/logout`

`POST /api/jobs`

`GET /api/jobs`

`GET /api/jobs/{job_id}`

`DELETE /api/jobs/{job_id}`

`GET /api/jobs/{job_id}/download`

`POST /api/jobs/batch-download`

`GET /api/jobs/{job_id}/logs`

`GET /api/jobs/stats`

`GET /api/config/models`

`GET /api/config/defaults`

`POST /api/config/estimate`

`POST /api/config/validate`