94 KiB
Brief Extractor - Comprehensive Technical Documentation v2.0
Document Version: 2.0 Last Updated: October 7, 2025 Application Version: 1.0.0 Author: Technical Documentation Team
Table of Contents
- Executive Summary
- System Architecture Overview
- Backend Architecture
- Frontend Architecture
- Data Flow and Processing Pipeline
- Authentication and Security
- WebSocket Real-Time Communication
- API Reference
- Data Models and Schemas
- Configuration Management
- Deployment Architecture
- Error Handling and Logging
- Performance and Scalability
- Development Guide
- Troubleshooting Guide
Executive Summary
The Brief Extractor is an enterprise-grade, multi-tenant document analysis platform that leverages multiple cutting-edge AI models (OpenAI GPT-5, Anthropic Claude, Google Gemini) in parallel to extract structured marketing asset information from unstructured creative briefs and presentations.
Key Features
- Multi-Model AI Analysis: Parallel processing using 3+ AI models simultaneously for comprehensive data extraction
- Intelligent Consolidation: Advanced deduplication and merging of multi-model results
- Real-Time Progress Tracking: WebSocket-based live updates with provider-specific progress reporting
- Enterprise Authentication: Microsoft Azure AD (MSAL) SSO integration with PKCE flow
- Multi-Tenant Architecture: Complete user isolation with per-user job queuing and data segregation
- Scalable Processing: Asynchronous job queue with configurable concurrency limits
- Production-Ready: Comprehensive error handling, logging, monitoring, and recovery mechanisms
Technology Stack
Backend:
- Framework: Quart (async Python web framework)
- AI Models: OpenAI GPT-5, Anthropic Claude Opus 4.1/Sonnet 4, Google Gemini 2.5 Pro
- Document Processing: LlamaParser cloud service for OCR and extraction
- Authentication: MSAL (Microsoft Authentication Library) with JWT validation
- Real-Time: WebSocket with automatic reconnection and health monitoring
- Data Storage: File-based storage with automatic cleanup and retention policies
Frontend:
- Framework: React 18 with TypeScript
- Build Tool: Vite 5 with HMR and optimized production builds
- State Management: Zustand for global state, TanStack Query for server state
- UI Framework: Tailwind CSS with custom design system
- Authentication: MSAL React with Azure AD integration
- Real-Time: Native WebSocket client with exponential backoff reconnection
System Architecture Overview
High-Level Architecture
graph TB
subgraph Browser["Client Browser (Frontend)"]
React["React Application<br/>- Upload UI<br/>- Queue View<br/>- Authentication"]
WSClient["WebSocket Client<br/>- Live Updates<br/>- Auto-Reconnect<br/>- Connection Health"]
React <--> WSClient
end
subgraph WebServer["Web Server (Apache/Nginx)"]
SSL["SSL/TLS Termination"]
WSProxy["WebSocket Upgrade Proxy"]
Static["Static File Serving"]
end
subgraph Backend["Quart Application Server (Backend)"]
subgraph API["API Layer"]
AuthAPI["Auth API"]
JobsAPI["Jobs API"]
ConfigAPI["Config API"]
end
subgraph Queue["Job Queue System"]
AsyncQueue["Async Queue"]
Workers["Workers (5)"]
Semaphore["Semaphore"]
end
subgraph WS["WebSocket Manager"]
Connections["Connections"]
Broadcasting["Broadcasting"]
UserTargeting["User Targeting"]
end
subgraph Processing["Job Processing Engine"]
Extract["Content Extraction<br/>(LlamaParser)"]
Analysis["Parallel Multi-Model<br/>Analysis"]
Consolidation["Result Consolidation"]
CSV["CSV Generation"]
end
end
subgraph External["External Services"]
OpenAI["OpenAI API<br/>(GPT-5)"]
Anthropic["Anthropic API<br/>(Claude)"]
Google["Google AI API<br/>(Gemini)"]
Llama["LlamaCloud API<br/>(Parsing)"]
end
React -->|"HTTPS/REST API"| WebServer
WSClient -->|"WSS (WebSocket)"| WebServer
WebServer --> API
WebServer --> WS
API --> Queue
Queue --> Processing
Processing --> OpenAI
Processing --> Anthropic
Processing --> Google
Processing --> Llama
Processing --> WS
Component Communication Flow
sequenceDiagram
actor User
participant Frontend
participant API as API Endpoint
participant JobMgr as Job Manager
participant Queue as Job Queue
participant Worker as Worker Pool
participant LLM as LLM Services
participant WS as WebSocket
User->>Frontend: Upload File
Frontend->>API: POST /api/jobs
API->>JobMgr: create_job()
JobMgr->>JobMgr: Save file to disk
JobMgr->>Queue: Enqueue job_id
JobMgr->>WS: Broadcast job.created
WS->>Frontend: Job created event
API->>Frontend: Job[] response
Queue->>Worker: Pull job_id
Worker->>Worker: Phase 1: Extract Content
Worker->>LLM: LlamaParser API
LLM->>Worker: Markdown content
Worker->>WS: Progress update (25%)
WS->>Frontend: job.progress event
Worker->>Worker: Phase 2: Parallel Analysis
par Parallel Execution
Worker->>LLM: OpenAI GPT-5
Worker->>LLM: Anthropic Claude
Worker->>LLM: Google Gemini
end
LLM->>Worker: All results
Worker->>WS: Progress update (75%)
WS->>Frontend: job.progress event
Worker->>Worker: Phase 3: Consolidation
Worker->>LLM: Consolidation model
LLM->>Worker: Merged results
Worker->>WS: Progress update (90%)
Worker->>Worker: Phase 4: Generate CSV
Worker->>WS: job.completed
WS->>Frontend: Completion event
Frontend->>User: Show download button
Deployment Architecture
Production Deployment:
- Frontend:
https://ai-sandbox.oliver.solutions/brief-extractor/ - Backend API:
https://ai-sandbox.oliver.solutions/brief-extractor-back/api - Backend WebSocket:
wss://ai-sandbox.oliver.solutions/brief-extractor-back/ws
Development Environment:
- Frontend:
http://localhost:3000 - Backend API:
http://localhost:8000/api - Backend WebSocket:
ws://localhost:8000/ws
Backend Architecture
Technology Stack
Core Framework: Quart 0.19+ (async Python web framework based on Flask API)
- Chosen for native async/await support required for parallel LLM calls
- ASGI-based for WebSocket support
- Compatible with Hypercorn ASGI server
Key Dependencies:
quart- Async web frameworkquart-cors- CORS middleware for cross-origin requestsopenai>=1.0.0- OpenAI GPT-5 client with responses APIanthropic>=0.67.0- Anthropic Claude client with async supportgoogle-genai[aiohttp]>=0.4.0- Google Gemini client with aiohttpllama-cloud-services>=0.6.62- LlamaParser document extractionmsal>=1.24.0- Microsoft Authentication LibraryPyJWT>=2.8.0- JWT token validationstructlog- Structured logging for production environmentspython-dotenv- Environment variable managementpydantic>=2.0.0- Data validation and schema definition
Directory Structure
graph LR
subgraph Server["server/ - Backend Application"]
App["app.py<br/>Main application"]
Config["config_runtime.py<br/>Runtime config"]
subgraph API["api/ - REST Endpoints"]
AuthAPI["auth.py<br/>/api/auth/*"]
ConfigAPI["config.py<br/>/api/config/*"]
JobsAPI["jobs.py<br/>/api/jobs/*"]
end
subgraph Auth["auth/ - Authentication"]
MSAL["msal_auth.py<br/>JWT validation"]
Middleware["middleware.py<br/>Decorators"]
end
subgraph Jobs["jobs/ - Job Management"]
Models["models.py<br/>Data models"]
Manager["manager.py<br/>Singleton registry"]
Storage["storage.py<br/>File operations"]
end
subgraph Runners["runners/ - Execution"]
JobRunner["job_runner.py<br/>Workers"]
EnhancedAnalyzer["enhanced_analyzer.py<br/>Progress hooks"]
Progress["progress.py<br/>Reporting"]
end
subgraph WS["ws/ - WebSocket"]
WSManager["manager.py<br/>Connections"]
end
end
subgraph Core["core/ - Processing Engine"]
CoreConfig["config.py<br/>Model config"]
ProcessBrief["process_brief_enhanced.py<br/>DocumentAnalyzer"]
Consolidation["consolidation_processor.py<br/>Result merging"]
subgraph LLMService["llm_service/ - Providers"]
Base["base_provider.py<br/>Abstract interface"]
OpenAI["openai_provider.py<br/>GPT-5"]
Anthropic["anthropic_provider.py<br/>Claude"]
GoogleProv["google_provider.py<br/>Gemini"]
ProvManager["provider_manager.py<br/>Parallel coordinator"]
end
end
App --> API
App --> Auth
App --> Jobs
App --> Runners
App --> WS
Runners --> Core
style Server fill:#e3f2fd
style Core fill:#e8f5e9
Core Components
1. Application Factory (server/app.py)
Purpose: Creates and configures the Quart application with all routes, middleware, and lifecycle hooks.
Key Responsibilities:
- Register API blueprints (
auth_bp,config_bp,jobs_bp) - Configure CORS for cross-origin requests
- Initialize WebSocket manager and job queue
- Set up application lifecycle (
before_serving,after_serving) - Start/stop background worker tasks
- Define health check and WebSocket endpoints
- Configure error handlers (400, 401, 403, 404, 413, 500)
Lifecycle Management:
@app.before_serving
async def startup():
# Start WebSocket background tasks (ping, cleanup)
await ws_manager.start_background_tasks()
# Start job processing workers (configurable count)
background_workers = await start_background_workers(
job_manager, ws_manager,
num_workers=server_config.MAX_CONCURRENT_JOBS
)
# Schedule periodic cleanup (hourly)
cleanup_task = asyncio.create_task(periodic_cleanup(job_manager))
@app.after_serving
async def shutdown():
# Stop all background workers gracefully
await stop_background_workers(background_workers)
await ws_manager.stop_background_tasks()
Critical Configuration:
MAX_CONTENT_LENGTH: File upload size limit (200MB default)SESSION_SECRET: Used for secure cookie signingSECURE_COOKIES,HTTPS_ONLY: Security flags for production
2. Job Manager (server/jobs/manager.py)
Pattern: Thread-safe Singleton Purpose: Central registry and queue for all processing jobs
Architecture:
classDiagram
class JobManager {
<<singleton>>
-_instance: JobManager
-jobs: Dict[str, Job]
-queue: asyncio.Queue
-processing_semaphore: Semaphore
-storage: StorageManager
-_lock: asyncio.Lock
+create_job(file, user_id) Job
+get_job(job_id) Job
+get_user_jobs(user_id) List~Job~
+delete_job(job_id) bool
+cleanup_expired_jobs() int
+serialize_all() List~Dict~
+get_instance()$ JobManager
}
class Job {
+id: str
+user_id: str
+phase: JobPhase
+progress_pct: int
+provider_updates: Dict
+logs: List~LogEntry~
+model_config: ModelConfiguration
+update_progress(phase, pct, label)
+mark_completed(url, summary, path)
+mark_failed(error)
+to_dict() Dict
}
class StorageManager {
+upload_dir: Path
+output_dir: Path
+save_uploaded_file(data, filename, job_id) str
+validate_file(filename, size) tuple
+cleanup_job_files(upload, output)
+cleanup_expired_files() int
}
JobManager --> Job : manages
JobManager --> StorageManager : uses
Key Operations:
Job Creation Flow:
- Validate file (extension, size, name)
- Create Job instance with unique UUID
- Save file to disk via
StorageManager - Add job to in-memory registry
- Enqueue job ID for processing
- Return job to API endpoint
User Isolation:
- Each job tagged with
user_idfrom authenticated token get_user_jobs()filters byuser_id- Users can only see/access their own jobs
- WebSocket broadcasts filtered by user
Concurrency Control:
# Semaphore limits concurrent processing
processing_semaphore = asyncio.Semaphore(MAX_CONCURRENT_JOBS)
# Worker acquires semaphore before processing
async with job_manager.processing_semaphore:
await run_job(job, ws_manager)
Cleanup and Retention:
- Periodic cleanup every hour via
periodic_cleanup()task - Removes jobs older than
FILE_RETENTION_HOURS(24h default) - Cleans up orphaned upload/output files
- Preserves active and recent jobs
3. Storage Manager (server/jobs/storage.py)
Purpose: Safe file operations with validation and cleanup
Directory Structure:
server/data/
├── uploads/ # Uploaded documents (temporary)
│ └── {job_id}_{sanitized_filename}.{ext}
└── outputs/ # Generated CSV files
└── {sanitized_basename}-{timestamp}.csv
File Operations:
- Validation: Extension whitelist, size limits, filename sanitization
- Safe Naming: Job ID prefix to prevent collisions
- Async I/O: Uses
run_in_executor()to avoid blocking event loop - Automatic Cleanup: Removes files older than retention period
Security Features:
- Filename sanitization removes special characters
- Length limits prevent path traversal
- Extension whitelist:
.pdf,.pptx,.docx,.xlsx,.ppt,.doc,.xls - No execution of uploaded files
4. Job Processing Pipeline (server/runners/job_runner.py)
Architecture: Background worker pool processing jobs from async queue
Worker Pool:
# Configurable number of workers (default: 5)
workers = []
for i in range(num_workers):
worker = asyncio.create_task(
process_job_queue(job_manager, ws_manager),
name=f"job-worker-{i}"
)
workers.append(worker)
Job Execution Flow:
1. Worker pulls job_id from queue (blocking until available)
2. Acquire processing semaphore (concurrency limit)
3. Create ProgressReporter for WebSocket updates
4. Execute run_job(job, ws_manager)
├─ Phase 1: Extract content (LlamaParser)
├─ Phase 2: Parallel multi-model analysis
├─ Phase 3: Consolidate results
├─ Phase 4: Generate CSV
└─ Phase 5: Mark completed/failed
5. Release semaphore
6. Mark queue task as done
Progress Reporting:
- Each phase reports progress percentage (0-100)
- Provider-specific updates (started, success, error, tokens, cost)
- Real-time log streaming to WebSocket clients
- Automatic error capture and reporting
5. LLM Service Layer (core/llm_service/)
Design Pattern: Provider abstraction with async parallel execution
Provider Hierarchy:
classDiagram
class BaseLLMProvider {
<<abstract>>
+api_key: str
+model_name: str
+generate_response(messages, schema)* LLMResponse
+validate_config()* bool
+estimate_cost(input, output, cached)* float
+get_max_tokens()* int
+prepare_messages(system, user) List
}
class OpenAIProvider {
+reasoning_effort: str
+timeout: int
+client: AsyncOpenAI
+generate_response() LLMResponse
+set_reasoning_effort(effort)
-_create_pydantic_model(schema)
-_save_debug_response()
}
class AnthropicProvider {
+thinking_budget: int
+temperature: float
+client: AsyncAnthropic
+generate_response() LLMResponse
-_two_call_approach()
-_convert_to_tool_schema()
}
class GoogleProvider {
+thinking_budget: int
+temperature: float
+client: genai.Client
+generate_response() LLMResponse
-_convert_to_gemini_schema()
}
BaseLLMProvider <|-- OpenAIProvider
BaseLLMProvider <|-- AnthropicProvider
BaseLLMProvider <|-- GoogleProvider
Common Interface:
async def generate_response(
messages: List[Dict[str, str]],
schema: Optional[Dict[str, Any]] = None,
**kwargs
) -> LLMResponse
Provider-Specific Features:
OpenAI (openai_provider.py):
- Uses
client.responses.parse()API for structured output - Configurable reasoning effort:
high,medium,low,minimal - Native Pydantic model support via
text_formatparameter - Automatic retry with exponential backoff (max_retries: 2)
- Timeout: 3600 seconds (1 hour) for long documents
- Two-stage validation: check
output_parsed, fallback tochoices[0].message.content
Anthropic (anthropic_provider.py):
- Two-call approach due to thinking mode incompatibility with structured output:
- Call A: Extended thinking with analysis (no forced tools)
- Call B: Structured JSON formatting (no thinking)
- Thinking budget: 12,000 tokens (configurable)
- Temperature: 1.0 for creative analysis
- Max tokens: 32,000 (Claude Sonnet 4), 64,000 (Claude Opus 4.1)
- Schema conversion to Anthropic tool format
Google (google_provider.py):
- Uses new
google-genaiSDK withclient.aioasync methods - Native thinking support with configurable budget (12,000 tokens)
- Schema conversion to Gemini response_schema format
- Largest context window: 2M tokens (Gemini 2.5 Pro)
- Temperature: 0.7 for balanced creativity/consistency
Provider Manager (provider_manager.py):
- Coordinates parallel execution across multiple providers
- Uses
asyncio.gather()for true concurrent API calls - Implements minimum success threshold (default: 1 model must succeed)
- Tracks per-provider timing, tokens, and costs
- Handles partial failures gracefully
Parallel Execution Example:
# All models process simultaneously
responses = await provider_manager.execute_parallel_analysis(
model_keys=['openai-gpt5', 'anthropic-sonnet4', 'google-gemini25'],
messages=analysis_messages,
schema=UNIVERSAL_BASE_DELIVERABLE_SCHEMA,
minimum_success_threshold=1
)
# Total time = slowest model, not sum of all models
# Example: GPT-5 (110s) + Claude (78s) + Gemini (49s) = 110s total (not 237s)
6. Consolidation System (core/consolidation_processor.py)
Purpose: Intelligently merge results from multiple AI models into optimal dataset
Consolidation Strategy:
- Inclusion Bias: If any model found a legitimate deliverable, include it
- Normalization: Canonicalize titles, categories, specifications before deduplication
- Smart Deduplication: Merge only when core identity matches (not just similar text)
- Quality Enhancement: Combine best specifications from all contributing models
Process Flow:
Input: [GPT-5 Result, Claude Result, Gemini Result]
│
├─ Format results as comparison prompt
├─ Load consolidation strategy template
└─ Execute with consolidation model (GPT-5 or Claude Opus)
│
├─ Returns: Consolidated base deliverables
├─ Validation: Ensure 'assets' key present
└─ Expansion: Generate individual assets from multipliers
Multiplier Expansion:
- Base deliverables have multiplier arrays (e.g., 5 sizes × 3 markets = 15 assets)
- Uses
itertools.product()for Cartesian product expansion - Validates:
technical_specifications × language_country_market ≈ quantity
7. Authentication System (server/auth/)
Architecture: MSAL-based SSO with development mode bypass
Components:
MSALAuthenticator (msal_auth.py):
- Validates JWT tokens from Microsoft Azure AD
- Supports PKCE flow (Public Client - no client secret required)
- Extracts user claims:
oid,preferred_username,name,roles - Token expiration checking via
expclaim - Audience validation (accepts Microsoft Graph audience)
Middleware (middleware.py):
@dev_mode_bypass: Creates mock user in development, validates in production@auth_required: Strict authentication enforcement@optional_auth: Extracts user if present but doesn't requireget_user_id(): Safely extracts user ID from request context
Token Flow:
1. Frontend obtains token via MSAL.js redirect flow
2. Token stored in localStorage
3. Frontend sends token in Authorization header: "Bearer <jwt>"
4. Backend middleware extracts and validates token
5. User info stored in request context (g.current_user)
6. Endpoints access user via get_user_id()
Development Mode:
DEV_MODE=true: Bypasses MSAL, creates mock userDEV_MODE=false: Requires valid Microsoft account authentication- Never use DEV_MODE in production!
8. WebSocket Manager (server/ws/manager.py)
Pattern: Singleton with async lock for thread safety
Architecture:
class WebSocketClient:
client_id: UUID # Unique connection identifier
user_id: str # Authenticated user ID (for targeting)
connected_at: datetime # Connection timestamp
last_ping: datetime # Heartbeat tracking
websocket: Quart WebSocket object
Connection Lifecycle:
- Client connects to
/ws?token=<jwt> - Backend validates token and extracts user ID
- Create
WebSocketClientand register in manager - Send
connection.establishedacknowledgment - Send initial
queue.snapshotwith user's jobs - Enter message loop (ping/pong, handle client messages)
- On disconnect: unregister client
Broadcasting Modes:
broadcast_to_all(): Send to all connected clientsbroadcast_to_user(user_id): Send to specific user's connectionsbroadcast_job_update(job_id): Send job-specific updates (currently broadcasts to all)
Background Tasks:
- Ping Loop: Sends ping every 30 seconds to keep connections alive
- Cleanup Loop: Removes stale connections (no ping for 90+ seconds)
Message Types:
connection.established: Initial handshakequeue.snapshot: Full job list on connectjob.created: New job addedjob.accepted: Job entered processing queuejob.progress: Phase and progress percentage updatesjob.provider_update: Per-model status (started, tokens, cost, error)job.log: Real-time log streamingjob.completed: Processing finished with resultsjob.failed: Processing errorjob.deleted: Job removedping/pong: Heartbeat mechanism
9. API Endpoints
Authentication API (/api/auth/*):
| Endpoint | Method | Purpose | Auth Required |
|---|---|---|---|
/api/auth/config |
GET | Get MSAL configuration for frontend | No |
/api/auth/validate |
POST | Validate access token | No |
/api/auth/user |
GET | Get current user info | Yes (bypass in dev) |
/api/auth/logout |
POST | Get logout URL for MSAL | No |
Configuration API (/api/config/*):
| Endpoint | Method | Purpose | Auth Required |
|---|---|---|---|
/api/config/models |
GET | List available AI models | Yes (bypass in dev) |
/api/config/defaults |
GET | Get default model configuration | Yes (bypass in dev) |
/api/config/estimate |
POST | Estimate processing cost | Yes (bypass in dev) |
/api/config/validate |
POST | Validate model configuration | Yes (bypass in dev) |
/api/config/system |
GET | Get system information | Yes (bypass in dev) |
Jobs API (/api/jobs/*):
| Endpoint | Method | Purpose | Auth Required |
|---|---|---|---|
/api/jobs |
POST | Create new jobs (multipart file upload) | Yes (bypass in dev) |
/api/jobs |
GET | List user's jobs (paginated) | Yes (bypass in dev) |
/api/jobs/{id} |
GET | Get specific job details | Yes (bypass in dev) |
/api/jobs/{id} |
DELETE | Delete job and files | Yes (bypass in dev) |
/api/jobs/{id}/download |
GET | Download CSV result (binary) | Yes (bypass in dev) |
/api/jobs/{id}/logs |
GET | Get job logs (paginated) | Yes (bypass in dev) |
/api/jobs/batch-download |
POST | Download multiple CSVs as ZIP | Yes (bypass in dev) |
/api/jobs/stats |
GET | Get job statistics for user | Yes (bypass in dev) |
/api/jobs/cleanup |
POST | Clean up expired jobs | Yes (bypass in dev) |
Health Endpoint:
/health(GET): System health with queue stats, WebSocket connections, config info
WebSocket Endpoint:
/ws(WebSocket): Real-time bidirectional communication with token-based auth
Frontend Architecture
Technology Stack
Core Framework: React 18.2 with TypeScript 5.2
- Chosen for component-based architecture and type safety
- Concurrent rendering features for smooth UI updates
- Strict mode enabled for development
Build System: Vite 5.0
- Fast HMR (Hot Module Replacement) for development
- Optimized production builds with code splitting
- Environment variable injection at build time
State Management:
- Zustand 4.4: Global client state (jobs, connection status)
- TanStack Query 5.8: Server state management, caching, background refetching
- MSAL React 2.1: Authentication state
UI Framework:
- Tailwind CSS 3.3: Utility-first styling
- Lucide React: Icon system (tree-shakeable)
- Custom Components: Reusable UI library
Key Dependencies:
react,react-dom- Core React libraries@azure/msal-browser,@azure/msal-react- Microsoft authenticationaxios- HTTP client with interceptorszustand- State management@tanstack/react-query- Server state and cachinglucide-react- Icon componentstailwind-merge,clsx- Dynamic className utilities
Directory Structure
graph LR
subgraph Frontend["frontend/src/ - React Application"]
Main["main.tsx<br/>Entry point"]
App["App.tsx<br/>Root component"]
subgraph Components["components/"]
subgraph AuthComp["auth/"]
AuthProvider["AuthProvider.tsx<br/>MSAL wrapper"]
AuthGuard["AuthGuard.tsx<br/>Route protection"]
LoginPage["LoginPage.tsx<br/>Login UI"]
end
subgraph UploadComp["upload/"]
UploadPanel["UploadPanel.tsx<br/>File upload"]
ModelSelector["ModelSelector.tsx<br/>Model config"]
CostEstimator["CostEstimator.tsx<br/>Cost preview"]
end
subgraph QueueComp["queue/"]
QueueView["QueueView.tsx<br/>Job list"]
JobCard["JobCard.tsx<br/>Job summary"]
JobAccordion["JobAccordion.tsx<br/>Details view"]
ProviderChips["ProviderChips.tsx<br/>Status badges"]
end
subgraph UIComp["ui/"]
Button["Button.tsx"]
Card["Card.tsx"]
ProgressBar["ProgressBar.tsx"]
end
Dashboard["Dashboard.tsx<br/>Main layout"]
end
subgraph Services["services/"]
APIClient["api.ts<br/>Axios client"]
WSClient["websocket.ts<br/>WS client"]
end
subgraph Stores["store/"]
AuthStore["authStore.ts<br/>Auth state"]
JobStore["jobStore.ts<br/>Job state"]
end
subgraph Hooks["hooks/"]
UseJobs["useJobs.ts"]
UseWS["useWebSocket.ts"]
end
Types["types/api.ts<br/>TypeScript defs"]
end
Main --> App
App --> Components
Components --> Services
Components --> Stores
Components --> Hooks
Services --> Types
Stores --> Types
style Frontend fill:#fff3e0
style Components fill:#e1f5fe
style Services fill:#f3e5f5
style Stores fill:#e8f5e9
State Management Architecture
Authentication Store (store/authStore.ts)
Zustand Store with Persistence:
interface AuthState {
isAuthenticated: boolean
user: User | null
authConfig: AuthConfig | null
isLoading: boolean
error: string | null
// Actions
login(accessToken: string): Promise<void>
logout(): Promise<void>
checkAuth(): Promise<void>
}
Key Features:
- Persists to localStorage (excludes sensitive data)
- Automatic token validation on mount
- Handles MSAL redirect responses
- Manages logout flow with Azure AD
Login Flow:
stateDiagram-v2
[*] --> Unauthenticated
Unauthenticated --> InitAuth: Load app
InitAuth --> GetConfig: GET /api/auth/config
GetConfig --> LoginScreen: Show login page
LoginScreen --> MSALRedirect: User clicks login
MSALRedirect --> AzureLogin: Redirect to Microsoft
AzureLogin --> Authenticating: User enters credentials
Authenticating --> MFACheck: Credentials valid
MFACheck --> TokenExchange: MFA complete (if required)
TokenExchange --> RedirectBack: Auth code received
RedirectBack --> HandleRedirect: MSAL.handleRedirectPromise()
HandleRedirect --> ValidateToken: POST /api/auth/validate
ValidateToken --> Authenticated: Token valid
ValidateToken --> LoginScreen: Token invalid
Authenticated --> ConnectWS: Connect WebSocket
ConnectWS --> Dashboard: Show main UI
Dashboard --> [*]
note right of TokenExchange
PKCE flow
No client secret
end note
note right of ValidateToken
Backend checks:
- exp claim
- aud claim
- Extracts user ID
end note
Job Store (store/jobStore.ts)
Zustand Store for Job Queue:
interface JobState {
jobs: Record<string, Job> // Job registry by ID
connectionStatus: 'connecting' | 'connected' | 'disconnected' | 'error'
selectedModels: ModelConfiguration | null
availableModels: ModelInfo[]
// Job Management
addJob(job: Job): void
updateJob(id: string, updates: Partial<Job>): void
updateProvider(jobId, modelKey, update): void
addLog(jobId, logEntry): void
removeJob(id: string): void
// WebSocket Connection
connectWebSocket(): void
disconnectWebSocket(): void
setConnectionStatus(status): void
// Model Configuration
loadAvailableModels(): Promise<void>
loadDefaultConfig(): Promise<void>
// Selectors
getActiveJobs(): Job[]
getCompletedJobs(): Job[]
getFailedJobs(): Job[]
getJobsByStatus(status): Job[]
}
WebSocket Integration:
- Sets up event handlers for all WebSocket message types
- Updates job state in real-time as messages arrive
- Manages connection status for UI indicators
- Automatically reconnects on disconnect
Component Architecture
Dashboard Component (components/Dashboard.tsx)
Layout Structure:
graph TB
subgraph Dashboard["Dashboard Layout"]
subgraph Header["Header Bar"]
Logo["Logo + Title"]
ConnStatus["Connection Status<br/>Indicator"]
Stats["Quick Stats<br/>(Active/Complete/Failed)"]
UserMenu["User Info + Logout"]
end
subgraph Main["Main Content Area"]
subgraph Upload["Upload Panel"]
FileSelect["Multi-file Selection<br/>(Drag & Drop)"]
ModelConfig["Model Configuration<br/>(Primary + Consolidation)"]
CostEst["Cost Estimation<br/>(Real-time)"]
end
subgraph Queue["Queue View"]
Active["Active Jobs<br/>(Progress Bars)"]
Complete["Completed Jobs<br/>(Download Links)"]
Failed["Failed Jobs<br/>(Error Details)"]
Batch["Batch Actions<br/>(Multi-download)"]
end
end
subgraph Footer["Footer Bar"]
Version["Version Info"]
PoweredBy["AI Model Credits"]
UserInfo["Current User"]
end
end
Header --> Main
Main --> Footer
Upload -.-> Queue
style Header fill:#f5f5f5
style Upload fill:#e3f2fd
style Queue fill:#e8f5e9
style Footer fill:#f5f5f5
Real-Time Features:
- Connection status indicator with manual reconnect button
- Live job count badges (processing, completed, failed)
- Auto-refresh on WebSocket disconnect fallback
Upload Panel (components/upload/UploadPanel.tsx)
Features:
- Multi-file drag-and-drop with validation
- Model selector with primary + consolidation configuration
- Real-time cost estimation before upload
- Progress indication during upload
- File size and type validation
Upload Workflow:
stateDiagram-v2
[*] --> FileSelection: User drops/selects files
FileSelection --> Validation: Files selected
Validation --> ModelConfig: Validation passed
Validation --> Error: Validation failed
Error --> FileSelection: Fix issues
ModelConfig --> CostEstimate: Models configured
CostEstimate --> ConfirmUpload: Review cost
ConfirmUpload --> Uploading: User confirms
Uploading --> CreateJobs: POST /api/jobs
CreateJobs --> JobCreated: Backend creates jobs
JobCreated --> QueueUpdate: WebSocket job.created
QueueUpdate --> [*]: Jobs in queue
note right of Validation
Size: max 200MB
Extensions: .pdf, .pptx, .docx, .xlsx
end note
note right of CostEstimate
Real-time calculation
Based on file size + model selection
end note
Queue View (components/queue/QueueView.tsx)
Display Sections:
- Active Jobs: Real-time progress bars, phase indicators, provider chips
- Completed Jobs: Summary stats, download button, expansion details
- Failed Jobs: Error messages, retry capability (future feature)
Job Card Features:
- Expandable accordion for detailed view
- Provider-specific status chips (color-coded)
- Real-time log streaming in expanded view
- Progress percentage with phase labels
- Token usage and cost display
WebSocket Client (services/websocket.ts)
Features:
- Automatic reconnection with exponential backoff
- Connection health monitoring via ping/pong
- Token-based authentication via query parameter
- Event-driven message handling
- Window focus/visibility detection for smart reconnection
Reconnection Strategy:
stateDiagram-v2
[*] --> Disconnected
Disconnected --> Attempt1: Wait 5s
Attempt1 --> Connected: Success
Attempt1 --> Attempt2: Failed
Attempt2 --> Connected: Success
Attempt2 --> Attempt3: Failed (wait 10s)
Attempt3 --> Connected: Success
Attempt3 --> GaveUp: Failed (wait 20s)
GaveUp --> GaveUp: Stop retrying
Connected --> Disconnected: Connection lost
GaveUp --> Attempt1: Window focus
note right of Attempt1
Initial delay: 5s
Max attempts: 3
end note
note right of Attempt2
Exponential backoff
10s delay
end note
note right of Attempt3
Final attempt
20s delay
end note
note right of GaveUp
Manual reconnect only
Health check every 30s
end note
Authentication:
// WebSocket URL with token
wss://domain.com/ws?token=<jwt_access_token>
// Backend extracts token from query param
token = websocket.args.get('token')
user_info = await msal_auth.validate_token(token)
client = register_client(user_info['oid'])
Data Flow and Processing Pipeline
Complete Processing Flow
flowchart TD
Start([User Uploads File]) --> Upload[1. FILE UPLOAD<br/>POST /api/jobs multipart/form-data<br/>Files + modelConfig]
Upload --> JobCreate[2. JOB CREATION Backend<br/>- Validate files<br/>- Create Job UUID<br/>- Save to disk<br/>- Add to registry<br/>- Enqueue job_id<br/>- Broadcast: job.created]
JobCreate --> WorkerPull[3. WORKER POOL<br/>Pull from queue<br/>Acquire semaphore]
WorkerPull --> Extract[4. STAGE 1: CONTENT EXTRACTION<br/>Phase: EXTRACT_CONTENT<br/>- LlamaParser API call<br/>- OCR + table detection<br/>- Returns markdown<br/>Progress: 25%]
Extract --> ParallelAnalysis[5. STAGE 2: PARALLEL ANALYSIS<br/>Phase: LLM_ANALYSIS<br/>asyncio.gather simultaneous]
ParallelAnalysis --> GPT5[OpenAI GPT-5<br/>Reasoning: medium<br/>~110 seconds]
ParallelAnalysis --> Claude[Anthropic Sonnet 4<br/>Two-call approach<br/>~78 seconds]
ParallelAnalysis --> Gemini[Google Gemini 2.5<br/>Thinking enabled<br/>~49 seconds]
GPT5 --> Gather[Collect Results<br/>Total time = slowest model 110s<br/>Progress: 75%]
Claude --> Gather
Gemini --> Gather
Gather --> Consolidate[6. STAGE 3: CONSOLIDATION<br/>Phase: CONSOLIDATION<br/>- Format model results<br/>- Load strategy template<br/>- Execute consolidation model<br/>- Smart deduplication<br/>- Validate 'assets' key<br/>Progress: 80%]
Consolidate --> Expand[7. STAGE 4: MULTIPLIER EXPANSION<br/>- Extract multiplier arrays<br/>- Cartesian product<br/>- 3 sizes × 5 markets = 15 assets<br/>- Validate quantity]
Expand --> CSVGen[8. STAGE 5: CSV GENERATION<br/>Phase: CSV_GENERATION<br/>- Convert to CSV rows<br/>- Async file write<br/>- Create JobSummary<br/>- Mark COMPLETED<br/>Progress: 100%]
CSVGen --> WSUpdate[9. WEBSOCKET UPDATE<br/>Broadcast: job.completed<br/>- resultCsvUrl<br/>- summary data]
WSUpdate --> UIUpdate[10. FRONTEND UPDATE<br/>- JobStore updates<br/>- UI re-renders<br/>- Download button active<br/>- Summary displayed]
UIUpdate --> End([Processing Complete])
style ParallelAnalysis fill:#e1f5ff
style GPT5 fill:#10a37f
style Claude fill:#d4a373
style Gemini fill:#4285f4
style Gather fill:#e1f5ff
Parallel Processing Optimization
gantt
title Processing Time Comparison
dateFormat X
axisFormat %S
section Sequential
GPT-5 (110s) :0, 110
Claude (78s) :110, 188
Gemini (49s) :188, 237
Total 237s :milestone, 237, 237
section Parallel
GPT-5 (110s) :0, 110
Claude (78s) :0, 78
Gemini (49s) :0, 49
Total 110s :milestone, 110, 110
Performance Gain:
- Sequential: 237 seconds (sum of all models)
- Parallel: 110 seconds (max of all models)
- Speedup: 2.15x faster
Implementation:
# Create tasks for all models
tasks = [
asyncio.create_task(openai_provider.generate_response(...)),
asyncio.create_task(anthropic_provider.generate_response(...)),
asyncio.create_task(google_provider.generate_response(...))
]
# Execute all concurrently
results = await asyncio.gather(*tasks, return_exceptions=True)
# Process results (all models complete at ~same time)
Authentication and Security
Microsoft Azure AD Integration
Authentication Flow: PKCE (Proof Key for Code Exchange) - Public Client Flow
Why PKCE:
- No client secret required (secure for SPAs)
- More secure than implicit flow
- Recommended by Microsoft for browser-based apps
- Frontend-initiated, backend validates
Complete Authentication Flow:
1. Frontend Initialization:
GET /api/auth/config
└─ Returns: { clientId, authority, redirectUri, devMode }
2. User Clicks "Sign in with Microsoft":
MSAL.loginRedirect({
scopes: ['openid', 'profile', 'User.Read'],
redirectUri: 'https://domain.com/brief-extractor/'
})
3. Redirect to Microsoft Login:
User authenticates with work/school account
MFA if configured in Azure AD
4. Microsoft Redirects Back:
https://domain.com/brief-extractor/?code=...&state=...
5. MSAL Exchanges Code for Token:
- Frontend MSAL library handles token exchange
- Receives access token (JWT)
- Token valid for ~1 hour
6. Frontend Validates Token:
POST /api/auth/validate
Body: { "accessToken": "<jwt>" }
└─ Backend validates:
- JWT signature (future: using Azure JWKS)
- Expiration (exp claim)
- Audience (aud claim)
- Returns user info if valid
7. Store Token:
localStorage.setItem('accessToken', token)
8. All Subsequent Requests:
Authorization: Bearer <jwt>
├─ API requests: Axios interceptor adds header
└─ WebSocket: Query parameter ?token=<jwt>
Security Mechanisms
Transport Security:
- HTTPS enforced in production (
HTTPS_ONLY=true) - WSS (WebSocket Secure) for real-time communication
- Secure cookies in production (
SECURE_COOKIES=true)
Authentication Security:
- JWT token validation on every request
- Token expiration enforcement (Azure AD TTL: ~1 hour)
- No client secrets in frontend code (PKCE flow)
- Automatic logout on 401 responses
Authorization:
- User ID extraction from validated JWT (
oidclaim) - All jobs tagged with
user_id - API endpoints filter by
user_id - Users cannot access other users' jobs/files
Input Validation:
- File extension whitelist
- File size limits (200MB default)
- Filename sanitization (remove special chars)
- Request payload size limits
- CORS restrictions to allowed origins
Data Isolation:
- Per-user job filtering in all endpoints
- WebSocket broadcasts filtered by
user_id - File storage uses job-specific UUIDs
- No shared state between users
Development Mode Security:
# DEV_MODE bypasses authentication - NEVER use in production!
if DEV_MODE:
# Creates mock user without validation
return {'oid': 'dev-user-id', 'name': 'Development User'}
else:
# Full JWT validation required
validate_token(access_token)
Azure AD Configuration Requirements
App Registration Settings:
- Platform: Single-page application
- Redirect URI:
https://ai-sandbox.oliver.solutions/brief-extractor/ - Supported Account Types: Single tenant or multi-tenant
- API Permissions: Microsoft Graph → User.Read (delegated)
- Token Configuration: ID tokens enabled for implicit flow
Required Environment Variables (Backend):
MSAL_CLIENT_ID=<from Azure Portal>
MSAL_TENANT_ID=<from Azure Portal>
MSAL_AUTHORITY=https://login.microsoftonline.com/<tenant_id>
MSAL_REDIRECT_URI=https://ai-sandbox.oliver.solutions/brief-extractor/
DEV_MODE=false # CRITICAL: Must be false in production
WebSocket Real-Time Communication
Architecture
Connection Model: Persistent bidirectional communication
- Uses native WebSocket API (browser) and Quart websocket (server)
- One connection per user session (can have multiple tabs = multiple connections)
- Automatic reconnection on network failures
Message Protocol
Message Structure:
{
"type": "message_type",
"timestamp": "2025-10-07T17:45:08.015Z",
"jobId": "uuid",
"...": "message-specific fields"
}
Message Types (Server → Client):
Connection Management:
{
"type": "connection.established",
"clientId": "uuid",
"userId": "user-oid",
"connectedAt": "2025-10-07T17:40:00.000Z"
}
Queue Snapshot (sent on connect):
{
"type": "queue.snapshot",
"jobs": [Job, Job, ...] // All user's jobs
}
Job Lifecycle:
// Job created
{
"type": "job.created",
"job": {Job object}
}
// Job accepted into queue
{
"type": "job.accepted",
"jobId": "uuid"
}
// Progress update
{
"type": "job.progress",
"jobId": "uuid",
"phase": "LLM_ANALYSIS",
"progressPct": 45,
"stepLabel": "Analyzing with Claude Sonnet 4",
"providerUpdates": {
"openai-gpt5": {
"status": "success",
"tokensIn": 5000,
"tokensOut": 3000,
"costUsd": 0.045,
"latencyMs": 85000
}
}
}
// Provider-specific update
{
"type": "job.provider_update",
"jobId": "uuid",
"modelKey": "anthropic-sonnet4",
"update": {
"provider": "anthropic",
"model": "claude-sonnet-4-20250514",
"status": "success",
"tokensIn": 6000,
"tokensOut": 2500,
"costUsd": 0.055,
"latencyMs": 78000
}
}
// Real-time log entry
{
"type": "job.log",
"jobId": "uuid",
"logEntry": {
"timestamp": "2025-10-07T17:42:46.474Z",
"level": "INFO",
"message": "Consolidation completed: 9 base deliverables"
}
}
// Job completion
{
"type": "job.completed",
"jobId": "uuid",
"resultCsvUrl": "/api/jobs/{uuid}/download",
"summary": {
"docType": "presentation",
"assetsExtracted": 19,
"confidenceScore": 0.95,
"costUsdTotal": 0.2759,
"tokensTotal": 41322,
"processingTimeSeconds": 291.1
}
}
// Job failure
{
"type": "job.failed",
"jobId": "uuid",
"error": "Consolidation failed: Response missing 'assets' key"
}
// Job deleted
{
"type": "job.deleted",
"jobId": "uuid"
}
Heartbeat:
// Server → Client (every 30s)
{
"type": "ping",
"timestamp": "2025-10-07T17:45:00.000Z"
}
// Client → Server
{
"type": "pong"
}
Frontend WebSocket Implementation
Connection Management (services/websocket.ts):
class WebSocketClient {
private ws: WebSocket | null
private reconnectInterval: 5000ms (initial)
private maxReconnectInterval: 60000ms
private maxReconnectAttempts: 3
connect() {
const wsUrl = `${VITE_WS_URL}/ws?token=${accessToken}`
this.ws = new WebSocket(wsUrl)
this.ws.onopen = () => {
// Reset reconnection counters
// Start ping interval (30s)
// Notify connection handlers
}
this.ws.onmessage = (event) => {
const message = JSON.parse(event.data)
this.handleMessage(message) // Route to event handlers
}
this.ws.onclose = () => {
// Stop ping interval
// Schedule reconnection (if not intentional)
}
}
scheduleReconnect() {
if (reconnectAttempts >= maxReconnectAttempts) {
// Stop trying after 3 attempts
return
}
setTimeout(() => {
this.connect()
this.reconnectInterval *= 2 // Exponential backoff
}, this.reconnectInterval)
}
}
State Updates via Zustand:
// Event handlers registered in jobStore
wsClient.on('job.progress', (message) => {
updateJob(message.jobId, {
phase: message.phase,
progressPct: message.progressPct,
stepLabel: message.stepLabel,
providerUpdates: message.providerUpdates
})
// React components re-render automatically
})
Connection Resilience
Auto-Reconnection Scenarios:
- Network interruption
- Server restart
- Apache/Nginx reload
- Temporary backend unavailability
Smart Reconnection:
- Window focus event: Reset attempts, immediate reconnect
- Page visibility change: Reduce penalty, try reconnect
- Health check: Ping
/healthevery 30s when disconnected - Connection restoration: Resume from last state
Fallback Without WebSocket:
- App remains fully functional
- Users can still upload, view, download
- Progress updates require manual page refresh
- No real-time log streaming (logs available on demand)
API Reference
Authentication API
GET /api/auth/config
Get MSAL configuration for frontend initialization.
Request: None (no auth required)
Response:
{
"config": {
"clientId": "9079054c-9620-4757-a256-23413042f1ef",
"authority": "https://login.microsoftonline.com/e519c2e6-bc6d-4fdf-8d9c-923c2f002385",
"redirectUri": "https://ai-sandbox.oliver.solutions/brief-extractor/",
"devMode": false
},
"devMode": false
}
POST /api/auth/validate
Validate an access token and return user information.
Request:
{
"accessToken": "eyJ0eXAiOiJKV1QiLCJhbGc..."
}
Response (Success):
{
"valid": true,
"user": {
"id": "38abcbd2-7558-4f64-aec2-fafc7807552c",
"username": "user@domain.com",
"name": "User Name",
"roles": ["user"]
}
}
Response (Invalid):
{
"valid": false,
"error": "invalid_token",
"message": "Token is invalid or expired"
}
GET /api/auth/user
Get current authenticated user information.
Headers: Authorization: Bearer <token>
Response:
{
"user": {
"id": "38abcbd2-7558-4f64-aec2-fafc7807552c",
"username": "user@domain.com",
"name": "User Name",
"roles": ["user"]
}
}
POST /api/auth/logout
Get logout URL for proper Microsoft session termination.
Request:
{
"redirectUri": "https://ai-sandbox.oliver.solutions/brief-extractor/"
}
Response:
{
"logoutUrl": "https://login.microsoftonline.com/{tenant}/oauth2/v2.0/logout?post_logout_redirect_uri=..."
}
Jobs API
POST /api/jobs
Create new processing jobs from uploaded files.
Headers:
Authorization: Bearer <token>Content-Type: multipart/form-data
Request Body:
file_0,file_1, ... : File uploadsmodelConfig(optional): JSON string with model configuration
Model Config Structure:
{
"primaryModels": ["openai-gpt5", "anthropic-sonnet4", "google-gemini25"],
"consolidationModel": "openai-gpt5",
"minimumSuccessThreshold": 1
}
Response:
{
"jobs": [
{
"id": "4614818d-38c6-4eac-aa39-659c89d90836",
"fileName": "brief.pdf",
"fileSize": 1048576,
"createdAt": "2025-10-07T17:40:00.000Z",
"updatedAt": "2025-10-07T17:40:00.000Z",
"userId": "38abcbd2-7558-4f64-aec2-fafc7807552c",
"phase": "QUEUED",
"progressPct": 0,
"stepLabel": "Queued for processing",
"providerUpdates": {},
"error": null,
"resultCsvUrl": null,
"summary": null,
"logs": [],
"modelConfig": {
"primaryModels": ["openai-gpt5", "anthropic-sonnet4", "google-gemini25"],
"consolidationModel": "openai-gpt5",
"minimumSuccessThreshold": 1
}
}
],
"errors": []
}
Error Responses:
// No files
{
"error": "no_files",
"message": "No files provided for upload"
}
// Invalid model config
{
"error": "invalid_model_config",
"message": "Invalid model configuration: ..."
}
// File too large
{
"error": "file_too_large",
"message": "File size exceeds 200MB limit"
}
GET /api/jobs
List jobs for the current authenticated user.
Headers: Authorization: Bearer <token>
Query Parameters:
limit(optional): Max results (default: 50, max: 100)offset(optional): Skip count for pagination (default: 0)status(optional): Filter by phase (e.g., "COMPLETED")
Response:
{
"jobs": [Job, Job, ...],
"pagination": {
"limit": 50,
"offset": 0,
"count": 15
}
}
GET /api/jobs/{job_id}
Get detailed information for a specific job.
Headers: Authorization: Bearer <token>
Response: Single Job object (same structure as POST /api/jobs)
Error:
{
"error": "not_found",
"message": "Job not found or access denied"
}
DELETE /api/jobs/{job_id}
Delete a job and all associated files.
Headers: Authorization: Bearer <token>
Response:
{
"message": "Job deleted successfully"
}
GET /api/jobs/{job_id}/download
Download the CSV result file for a completed job.
Headers: Authorization: Bearer <token>
Response:
- Content-Type:
text/csv; charset=utf-8 - Content-Disposition:
attachment; filename="brief-20251007174508.csv" - Body: CSV file content
Error (Job Not Complete):
{
"error": "not_ready",
"message": "Job has not completed processing yet"
}
POST /api/jobs/batch-download
Download multiple CSV files as a ZIP archive.
Headers: Authorization: Bearer <token>
Request:
{
"jobIds": ["uuid1", "uuid2", "uuid3"]
}
Response:
- Content-Type:
application/zip - Content-Disposition:
attachment; filename="brief-extractor-results-{timestamp}.zip" - Body: ZIP file containing CSV files
GET /api/jobs/{job_id}/logs
Get processing logs for a specific job.
Headers: Authorization: Bearer <token>
Query Parameters:
limit(optional): Max log entries (default: 100)level(optional): Filter by level (DEBUG, INFO, WARNING, ERROR)
Response:
{
"logs": [
{
"timestamp": "2025-10-07T17:42:46.474Z",
"level": "INFO",
"message": "Starting consolidation with 2 model results using openai-gpt5"
}
],
"count": 150
}
GET /api/jobs/stats
Get job processing statistics for the current user.
Headers: Authorization: Bearer <token>
Response:
{
"stats": {
"total": 25,
"completed": 20,
"failed": 2,
"processing": 3,
"queued": 0,
"totalAssetsExtracted": 487,
"totalCostUsd": 5.67,
"totalProcessingTime": 3600.5,
"averageAssetsPerJob": 24.35,
"averageCostPerJob": 0.283
}
}
Configuration API
GET /api/config/models
List all available AI models with pricing and capabilities.
Response:
{
"models": [
{
"key": "openai-gpt5",
"name": "GPT-5",
"provider": "OpenAI",
"description": "Latest OpenAI model with advanced reasoning capabilities",
"costPer1mInput": 2.50,
"costPer1mOutput": 10.00,
"canBePrimary": true,
"canBeConsolidation": true
},
{
"key": "anthropic-sonnet4",
"name": "Claude Sonnet 4",
"provider": "Anthropic",
"description": "Balanced performance and cost",
"costPer1mInput": 3.00,
"costPer1mOutput": 15.00,
"canBePrimary": true,
"canBeConsolidation": true
}
]
}
GET /api/config/defaults
Get default model configuration.
Response:
{
"config": {
"primaryModels": ["openai-gpt5", "anthropic-sonnet4", "google-gemini25"],
"consolidationModel": "openai-gpt5",
"minimumSuccessThreshold": 1
}
}
POST /api/config/estimate
Estimate processing cost before uploading.
Request:
{
"modelConfig": {
"primaryModels": ["openai-gpt5", "anthropic-sonnet4"],
"consolidationModel": "anthropic-opus4"
},
"fileSizeBytes": 1048576,
"estimatedTokens": 10000
}
Response:
{
"estimatedCostUsd": 0.45,
"breakdown": {
"openai-gpt5": 0.15,
"anthropic-sonnet4": 0.12,
"anthropic-opus4": 0.18
},
"estimatedTokens": {
"input": 8000,
"output": 6000,
"total": 14000
},
"estimatedTime": "90-180 seconds"
}
POST /api/config/validate
Validate model configuration before submission.
Request:
{
"modelConfig": {
"primaryModels": ["invalid-model"],
"consolidationModel": "openai-gpt5"
}
}
Response:
{
"valid": false,
"errors": [
"Primary model 'invalid-model' is not available"
],
"warnings": [
"Using only 1 primary model - consider using 2-3 for better accuracy"
],
"modelCount": {
"primary": 1,
"consolidation": 1,
"total": 2
}
}
Data Models and Schemas
Job Data Model
@dataclass
class Job:
id: str # UUID
file_name: str # Original filename
file_size: int # Bytes
created_at: datetime # UTC timestamp
updated_at: datetime # UTC timestamp
user_id: str # Azure AD user OID
upload_path: str # Disk path
output_path: Optional[str] # CSV path (when complete)
phase: JobPhase # Current processing phase
progress_pct: int # 0-100
step_label: str # Human-readable step
provider_updates: Dict[str, ProviderUpdate] # Per-model status
error: Optional[str] # Error message if failed
result_csv_url: Optional[str] # Download endpoint
summary: Optional[JobSummary] # Completion summary
logs: List[LogEntry] # Processing logs
model_config: ModelConfiguration # AI model settings
Job Phases
stateDiagram-v2
[*] --> QUEUED: Job created
QUEUED --> EXTRACT_CONTENT: Worker picks up
EXTRACT_CONTENT --> LLM_ANALYSIS: Content extracted
LLM_ANALYSIS --> CONSOLIDATION: Analysis complete
CONSOLIDATION --> CSV_GENERATION: Results consolidated
CSV_GENERATION --> COMPLETED: CSV written
EXTRACT_CONTENT --> FAILED: Extraction error
LLM_ANALYSIS --> FAILED: All models failed
CONSOLIDATION --> FAILED: Consolidation error
CSV_GENERATION --> FAILED: Write error
COMPLETED --> [*]
FAILED --> [*]
note right of QUEUED
Progress: 0%
Waiting for worker
end note
note right of EXTRACT_CONTENT
Progress: 10-25%
LlamaParser API
end note
note right of LLM_ANALYSIS
Progress: 25-75%
Parallel model execution
end note
note right of CONSOLIDATION
Progress: 75-90%
Single model merging
end note
note right of CSV_GENERATION
Progress: 90-100%
File write
end note
Provider Update Model
@dataclass
class ProviderUpdate:
provider: str # 'openai', 'anthropic', 'google'
model: str # 'gpt-5', 'claude-sonnet-4', etc.
status: str # 'started', 'success', 'error'
started_at: Optional[str] # ISO timestamp
completed_at: Optional[str] # ISO timestamp
latency_ms: Optional[float] # Processing duration
tokens_in: Optional[int] # Input tokens
tokens_out: Optional[int] # Output tokens
tokens_cached: Optional[int] # Cached tokens (cost reduction)
cost_usd: Optional[float] # Estimated cost
error: Optional[str] # Error message if failed
Base Deliverable Schema
Purpose: Intermediate format with multiplier arrays (before expansion)
class BaseDeliverable(BaseModel):
# Metadata (String Fields)
title: str # Required
status: Optional[str] = "" # "Draft", "In Progress", "Final"
category: Optional[str] = "" # "Paid Social - Meta Feed"
media: Optional[str] = "" # "IMAGE", "VIDEO", "COPY"
asset_type: Optional[str] = "" # "JPG", "PNG", "MP4"
brand_identifier: Optional[str] = "" # "adidas TERREX"
# Multiplier Arrays (Expansion Fields)
technical_specifications: Optional[List[str]] = [] # ["1080x1080", "1920x1080"]
language_country_market: Optional[List[str]] = [] # ["EN-UK", "DE-DE", "IT-IT"]
# Dates and References (String Fields)
review_date: Optional[str] = "" # "2024-08-08"
live_date: Optional[str] = "" # "08/08"
end_date: Optional[str] = "" # "2025-12-31"
reference_material: Optional[str] = "" # URLs or notes
# Metadata (String Fields)
quantity: Optional[str] = "1" # For validation
page_number: Optional[str] = "" # "3-4"
priority_level: Optional[str] = "" # "High"
creative_direction: Optional[str] = "" # Design requirements
Multiplier Expansion Example:
graph LR
Base["Base Deliverable<br/><br/>Title: Hero Slider<br/>Specs: [750x1200, 1920x853]<br/>Markets: [IT-IT]<br/>Quantity: 2"]
subgraph Expansion["Cartesian Product Expansion"]
Combo1["750x1200 × IT-IT"]
Combo2["1920x853 × IT-IT"]
end
Asset1["Asset 1<br/>Hero Slider (750x1200, IT-IT)<br/>Quantity: 1"]
Asset2["Asset 2<br/>Hero Slider (1920x853, IT-IT)<br/>Quantity: 1"]
Base --> Expansion
Combo1 --> Asset1
Combo2 --> Asset2
style Base fill:#fff3e0
style Expansion fill:#e3f2fd
style Asset1 fill:#e8f5e9
style Asset2 fill:#e8f5e9
Marketing Asset Schema
Purpose: Final individual assets for CSV export (after expansion)
class MarketingAsset(BaseModel):
# All fields become strings (arrays expanded into individual assets)
title: str # "Hero Slider (750x1200, IT-IT)"
status: Optional[str] = ""
category: Optional[str] = ""
media: Optional[str] = ""
asset_type: Optional[str] = ""
brand_identifier: Optional[str] = ""
technical_specifications: Optional[str] = "" # Single value: "750x1200"
review_date: Optional[str] = ""
live_date: Optional[str] = ""
end_date: Optional[str] = ""
reference_material: Optional[str] = ""
language_country_market: Optional[str] = "" # Single value: "IT-IT"
quantity: Optional[str] = "1" # Always "1" for individuals
page_number: Optional[str] = ""
priority_level: Optional[str] = ""
creative_direction: Optional[str] = ""
CSV Output Format (16 Columns)
title,category,media,asset_type,technical_specifications,language_country_market,quantity,brand_identifier,review_date,live_date,end_date,reference_material,page_number,priority_level,creative_direction,status
"Hero Slider (750x1200, IT-IT)","Wholesale - Hero Slider","IMAGE","JPG","750x1200","IT-IT","1","adidas TERREX","2024-08-08","08/08","","https://drive.google.com/...","3-4","High","Adapt as per layouts...","Draft"
Configuration Management
Environment Variables
Backend Configuration (.env in project root):
# =============================================================================
# API KEYS (Required)
# =============================================================================
OPENAI_API_KEY=sk-... # OpenAI GPT-5 access
ANTHROPIC_API_KEY=sk-ant-api03-... # Anthropic Claude access
GOOGLE_API_KEY=AIzaSy... # Google Gemini access
LLAMACLOUD_API_KEY=llx-... # LlamaParser cloud service
# =============================================================================
# OPENAI CONFIGURATION
# =============================================================================
OPENAI_MODEL=gpt-5
OPENAI_REASONING_EFFORT=medium # high, medium, low, minimal
OPENAI_TIMEOUT=3600 # 1 hour (for long documents)
OPENAI_MAX_RETRIES=2
# =============================================================================
# ANTHROPIC CONFIGURATION
# =============================================================================
ANTHROPIC_MODEL_OPUS=claude-opus-4-1-20250805
ANTHROPIC_MODEL_SONNET=claude-sonnet-4-20250514
ANTHROPIC_TEMPERATURE=1 # Higher for creative analysis
ANTHROPIC_MAX_TOKENS=32000 # Sonnet limit (Opus: 64000)
ANTHROPIC_THINKING_BUDGET=12000 # Thinking tokens
ANTHROPIC_TIMEOUT=300 # 5 minutes
# =============================================================================
# GOOGLE CONFIGURATION
# =============================================================================
GOOGLE_MODEL=gemini-2.5-pro
GOOGLE_TEMPERATURE=0.7
GOOGLE_MAX_OUTPUT_TOKENS=100000
GOOGLE_THINKING_BUDGET=12000
GOOGLE_TIMEOUT=3600
# =============================================================================
# PROCESSING CONFIGURATION
# =============================================================================
DEFAULT_PRIMARY_MODELS=openai-gpt5,anthropic-sonnet4,google-gemini25
DEFAULT_CONSOLIDATION_MODEL=openai-gpt5
MINIMUM_SUCCESS_THRESHOLD=1 # Min models that must succeed
ENABLE_COST_ESTIMATION=true
MAX_PROCESSING_COST_USD=10.00
# =============================================================================
# MSAL AUTHENTICATION (Azure AD)
# =============================================================================
MSAL_CLIENT_ID=9079054c-9620-4757-a256-23413042f1ef
MSAL_CLIENT_SECRET=placeholder # Not used for PKCE flow
MSAL_TENANT_ID=e519c2e6-bc6d-4fdf-8d9c-923c2f002385
MSAL_REDIRECT_URI=https://ai-sandbox.oliver.solutions/brief-extractor/
MSAL_AUTHORITY=https://login.microsoftonline.com/e519c2e6-bc6d-4fdf-8d9c-923c2f002385
# =============================================================================
# SECURITY AND RUNTIME
# =============================================================================
DEV_MODE=false # MUST be false in production!
ALLOWED_ORIGINS=https://ai-sandbox.oliver.solutions
SESSION_SECRET=<random-secret-here>
SECURE_COOKIES=true # true for HTTPS
HTTPS_ONLY=true # true for production
# =============================================================================
# JOB PROCESSING
# =============================================================================
MAX_CONCURRENT_JOBS=5 # Parallel job processing limit
MAX_UPLOAD_SIZE_MB=200 # Per-file upload limit
FILE_RETENTION_HOURS=24 # Auto-cleanup threshold
WS_PING_INTERVAL_SECONDS=30 # WebSocket heartbeat
# =============================================================================
# SERVER CONFIGURATION
# =============================================================================
SERVER_HOST=0.0.0.0
SERVER_PORT=8002
SERVER_WORKERS=2 # Hypercorn workers (has no effect with serve)
Frontend Configuration (frontend/.env):
# Backend API and WebSocket URLs (embedded at build time)
# Production
VITE_API_URL=https://ai-sandbox.oliver.solutions/brief-extractor-back/api
VITE_WS_URL=wss://ai-sandbox.oliver.solutions/brief-extractor-back
# Local Development (comment out production, uncomment below)
# VITE_API_URL=http://localhost:8000/api
# VITE_WS_URL=ws://localhost:8000
Build Configuration (frontend/vite.config.ts):
export default defineConfig({
base: '/brief-extractor/', // Deployment path prefix
plugins: [react()],
resolve: {
alias: {
'@': path.resolve(__dirname, './src') // Import alias
}
},
server: {
port: 3000,
proxy: { // Dev server proxying
'/api': {
target: 'http://localhost:8000',
changeOrigin: true
},
'/ws': {
target: 'ws://localhost:8000',
ws: true
}
}
}
})
Configuration Loading Priority
Backend:
- Environment variables from
.envfile - Default values in
core/config.pyandserver/config_runtime.py - Runtime overrides (future feature)
Frontend:
- Build-time environment variables (
VITE_*) - Fallback defaults in code (e.g.,
/apifor VITE_API_URL)
Deployment Architecture
Production Deployment Topology
graph TB
Internet["Internet<br/>(HTTPS/WSS)"]
subgraph Server["Production Server: ai-sandbox.oliver.solutions"]
Apache["Apache Web Server<br/>Port 443<br/>- SSL/TLS Termination<br/>- Virtual Host<br/>- ProxyPass WebSocket<br/>- Serve static files"]
subgraph Static["Static Files"]
Frontend["/brief-extractor/<br/>/var/www/html/brief-extractor/dist/"]
end
subgraph Backend["Quart Application"]
Hypercorn["Hypercorn ASGI Server<br/>Port 8002<br/>Systemd service"]
Workers["5 Async Job Processors"]
WSSupport["WebSocket Support"]
Storage["File Storage<br/>/server/data/"]
end
end
subgraph External["External APIs"]
OpenAI["OpenAI API"]
Anthropic["Anthropic API"]
Google["Google AI API"]
Llama["LlamaCloud API"]
AzureAD["Azure AD<br/>Authentication"]
end
Internet -->|"HTTPS"| Apache
Apache -->|"Proxy<br/>/brief-extractor-back/"| Hypercorn
Apache -.->|"Serve"| Frontend
Hypercorn --> Workers
Hypercorn --> WSSupport
Workers --> Storage
Workers --> OpenAI
Workers --> Anthropic
Workers --> Google
Workers --> Llama
Frontend --> AzureAD
style Apache fill:#f9f9f9
style Hypercorn fill:#e3f2fd
style Workers fill:#e8f5e9
style Frontend fill:#fff3e0
Apache Configuration
# Brief Extractor - WebSocket and HTTP proxy
ProxyPass /brief-extractor-back/ws ws://localhost:8002/ws
ProxyPass /brief-extractor-back/ http://localhost:8002/
ProxyPassReverse /brief-extractor-back/ http://localhost:8002/
# Static frontend files
Alias /brief-extractor /var/www/html/brief-extractor/dist
<Directory /var/www/html/brief-extractor/dist>
Options -Indexes +FollowSymLinks
AllowOverride None
Require all granted
# SPA routing support
RewriteEngine On
RewriteBase /brief-extractor/
RewriteRule ^index\.html$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /brief-extractor/index.html [L]
</Directory>
# Required Apache modules
# sudo a2enmod proxy proxy_http proxy_wstunnel rewrite
Systemd Service Configuration
Service File: /etc/systemd/system/brief-extractor.service
[Unit]
Description=Brief Extractor Backend Service
After=network.target
[Service]
Type=simple
User=www-data
WorkingDirectory=/var/www/html/brief-extractor/backend
Environment="PATH=/var/www/html/brief-extractor/backend/venv/bin:/usr/bin"
ExecStart=/var/www/html/brief-extractor/backend/venv/bin/python -m server.app
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
Service Management:
# Start service
sudo systemctl start brief-extractor
# Enable on boot
sudo systemctl enable brief-extractor
# View logs
sudo journalctl -u brief-extractor -f
# Restart after code update
sudo systemctl restart brief-extractor
Build and Deployment Process
Backend Deployment:
# 1. Update code on server
cd /var/www/html/brief-extractor/backend
git pull origin main
# 2. Activate virtual environment
source venv/bin/activate
# 3. Install/update dependencies
pip install -r requirements_enhanced.txt
# 4. Update .env file (if needed)
nano .env
# 5. Restart service
sudo systemctl restart brief-extractor
# 6. Verify deployment
curl https://ai-sandbox.oliver.solutions/brief-extractor-back/health
Frontend Deployment:
# 1. On development machine, configure production URLs
cd frontend
nano .env # Ensure VITE_API_URL points to production
# 2. Build for production
npm run build
# 3. Deploy to server
scp -r dist/* user@server:/var/www/html/brief-extractor/dist/
# 4. Verify deployment
# Visit: https://ai-sandbox.oliver.solutions/brief-extractor/
Environment-Specific Builds:
# Build for production server
VITE_API_URL=https://ai-sandbox.oliver.solutions/brief-extractor-back/api \
VITE_WS_URL=wss://ai-sandbox.oliver.solutions/brief-extractor-back \
npm run build
# Build for local development
VITE_API_URL=http://localhost:8000/api \
VITE_WS_URL=ws://localhost:8000 \
npm run build
Error Handling and Logging
Backend Logging Strategy
Structured Logging (Structlog):
import structlog
structlog.configure(
processors=[
structlog.stdlib.filter_by_level,
structlog.stdlib.add_logger_name,
structlog.stdlib.add_log_level,
structlog.processors.TimeStamper(fmt="ISO"),
structlog.processors.JSONRenderer() # JSON for production
]
)
logger = structlog.get_logger(__name__)
logger.info("Event occurred", key="value", user_id=user_id)
Output Format:
{
"event": "Job processing completed",
"logger": "server.runners.job_runner",
"level": "info",
"timestamp": "2025-10-07T17:45:08.132Z",
"job_id": "uuid",
"assets_extracted": 19,
"cost_usd": 0.2759
}
Log Levels:
- DEBUG: Detailed diagnostic information (disabled in production)
- INFO: General informational messages (job lifecycle, API calls)
- WARNING: Warning conditions (token validation issues, model failures)
- ERROR: Error events (job failures, API errors, exceptions)
Key Logging Points:
Job Lifecycle:
logger.info(f"Created job {job.id} for file {file_name} (user: {user_id})")
logger.info(f"Processing job {job_id}: {job.file_name}")
logger.info(f"Job {job_id} completed successfully: {assets} assets, ${cost}, {time}s")
logger.error(f"Job {job_id} failed: {error}", exc_info=True)
AI Model Calls:
# Standard success logging
logger.info(f"[INITIAL] Structured output validated: 9 assets")
# Verbose error logging (only when problems occur)
logger.error(f"[CONSOLIDATION] ========== MISSING 'assets' KEY ==========")
logger.error(f"[CONSOLIDATION] Full raw content: {response.content}")
logger.error(f"[CONSOLIDATION] Debug file saved: /tmp/consolidation_debug_*.json")
WebSocket Events:
logger.info(f"Registered WebSocket client {client_id} for user {user_id}")
logger.warning(f"WebSocket connection rejected - no valid authentication")
logger.debug(f"Broadcast message to {sent_count} clients for user {user_id}")
Error Handling Patterns
API Error Responses:
# Validation Error (400)
return jsonify({
'error': 'invalid_request',
'message': 'Specific validation error details'
}), 400
# Authentication Error (401)
return jsonify({
'error': 'unauthorized',
'message': 'Valid authentication required'
}), 401
# Not Found (404)
return jsonify({
'error': 'not_found',
'message': 'Job not found or access denied'
}), 404
# Internal Error (500)
return jsonify({
'error': 'server_error',
'message': 'Internal server error'
}), 500
Job Processing Errors:
try:
# Processing stages...
result = await analyzer.process_document_multi_model(...)
except Exception as e:
# Capture and report error
error_msg = f"Job processing failed: {str(e)}"
logger.error(f"Job {job.id} failed: {error_msg}", exc_info=True)
# Update job state
await progress.emit_failure(error_msg)
job.mark_failed(error_msg)
# Broadcast via WebSocket
await ws_manager.broadcast_job_update(job.id, {
'type': 'job.failed',
'jobId': job.id,
'error': error_msg
})
return False # Job failed
Partial Failure Handling:
# LLM Analysis - allow partial success
responses, metadata = await provider_manager.execute_parallel_analysis(
models=['openai-gpt5', 'anthropic-sonnet4', 'google-gemini25'],
minimum_success_threshold=1 # At least 1 must succeed
)
# If 2 out of 3 models fail, processing continues with 1 result
# Consolidation still occurs, just with less data diversity
Debug Artifact Generation
When Consolidation Fails:
# Automatic debug file creation
debug_file = f"/tmp/consolidation_debug_{timestamp}.json"
{
"timestamp": "20251007_174500",
"consolidation_model": "gpt-5",
"raw_content": "{}", # Empty response
"parsed_data": {},
"primary_analysis_results": [
{
"provider": "anthropic",
"model": "claude-sonnet-4",
"success": true,
"deliverable_count": 9
}
],
"token_usage": {...}
}
Location: /tmp/ directory on server
Purpose: Post-mortem analysis of API responses
Includes: Full request context, model outputs, token stats
Performance and Scalability
Performance Characteristics
Processing Times (Typical 10-page Brief):
- Content Extraction: 10-30 seconds (LlamaParser)
- Parallel Analysis: 50-120 seconds (limited by slowest model)
- GPT-5: 90-110 seconds
- Claude Sonnet: 60-80 seconds
- Gemini: 40-50 seconds
- Consolidation: 60-90 seconds (single model)
- CSV Generation: <1 second
- Total: 2-4 minutes end-to-end
Token Usage (Typical):
- Input: 8,000-12,000 tokens (document + prompt)
- Output: 2,000-6,000 tokens per model
- Total: 30,000-50,000 tokens across all models
Cost (Typical):
- 3-Model Analysis: $0.20-$0.40
- With Premium Consolidation (Opus): +$0.15-$0.25
- Average Per Document: $0.25-$0.45
Scalability Analysis
Concurrent Processing:
MAX_CONCURRENT_JOBS=5 → 5 documents processed simultaneously
Each job uses 3 primary models + 1 consolidation = 4 LLM calls
Maximum concurrent LLM calls: 5 jobs × 3 models = 15 parallel API calls
(Consolidation sequential within each job)
Memory Footprint:
Per Job:
- Uploaded file: ~1-50 MB (in memory during upload, then on disk)
- Extracted content: ~50-500 KB (markdown)
- LLM responses: ~10-50 KB each × 4 models = 40-200 KB
- Job metadata: ~5-10 KB
Total per job: ~1-50 MB (mostly file content)
With 5 concurrent jobs: ~5-250 MB total
Network Bandwidth:
Upload: User → Server
- Per document: 1-50 MB
- Rate limiting: None (could add via nginx)
LLM API Calls: Server → AI Providers
- Per request: 10-100 KB (prompts)
- Per response: 5-50 KB (structured JSON)
- Concurrent: 15 simultaneous connections
Download: Server → User
- Per CSV: 5-500 KB (typically < 100 KB)
- Batch ZIP: Up to 10 MB for large batches
Bottlenecks and Optimizations
Current Bottlenecks:
-
LLM API Latency: 50-120 seconds (external dependency)
- Mitigation: Parallel execution reduces total time
- Future: Caching for similar documents
-
File Upload Speed: Network dependent
- Mitigation: Chunked upload (future)
- Compression: Could reduce bandwidth 50-70%
-
Concurrent Job Limit: MAX_CONCURRENT_JOBS=5
- Rationale: Cost control, API rate limits
- Tunable: Can increase to 10-20 with monitoring
Optimization Strategies:
1. Token Caching (Prompt Caching):
# OpenAI and Anthropic support prompt caching
# Repeated analysis of similar documents reuses cached context
# Savings: 50-90% on input token costs
2. Result Caching (Future):
# Cache analysis results by document hash
# If same file uploaded again, return cached result
# Savings: 100% cost reduction for duplicates
3. Async Everything:
# File I/O: run_in_executor() for blocking operations
# Database: Currently in-memory (future: async DB driver)
# API calls: Native async clients (AsyncOpenAI, AsyncAnthropic)
4. Smart Model Selection:
# Cost-optimized: GPT-5 + Gemini (cheapest)
# Quality-optimized: All 3 models
# Speed-optimized: Sonnet + Gemini (fastest)
Monitoring and Observability
Health Check Endpoint:
curl https://ai-sandbox.oliver.solutions/brief-extractor-back/health
{
"status": "healthy",
"timestamp": "2025-10-07T17:45:00.000Z",
"queue": {
"pending": 2,
"active": 3
},
"websockets": {
"total_connections": 5,
"unique_users": 3
},
"config": {
"devMode": false,
"maxConcurrentJobs": 5,
"maxUploadSize": "200MB"
}
}
Metrics to Monitor:
- Queue depth (
queue.pending) - Active jobs (
queue.active) - WebSocket connection count
- Average processing time per job
- Error rate (failed jobs / total jobs)
- API response times
- Token usage and costs
Logging Integration:
# Systemd journal
sudo journalctl -u brief-extractor -f --since "1 hour ago"
# Filter by log level
sudo journalctl -u brief-extractor -p err
# Filter by job ID
sudo journalctl -u brief-extractor | grep "job_id=uuid"
Development Guide
Local Development Setup
Prerequisites:
- Python 3.13+ with virtual environment support
- Node.js 18+ with npm
- Git for version control
- Azure AD app registration (for auth testing)
Backend Setup:
# 1. Clone repository
git clone <repo_url>
cd adi-o3-multipass
# 2. Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate # Mac/Linux
# or: venv\Scripts\activate # Windows
# 3. Install dependencies
pip install -r requirements_enhanced.txt
# 4. Configure environment
cp .env.example .env
nano .env # Add API keys, set DEV_MODE=true
# 5. Verify configuration
python -c "from core.config import config; print(config.validate_api_keys())"
# 6. Run development server
python -m server.app
# Server starts on http://0.0.0.0:8000
Frontend Setup:
# 1. Navigate to frontend
cd frontend
# 2. Install dependencies
npm install
# 3. Configure environment
cp .env.example .env
nano .env # Set local development URLs
# Example .env for local dev:
# VITE_API_URL=http://localhost:8000/api
# VITE_WS_URL=ws://localhost:8000
# 4. Start development server
npm run dev
# Server starts on http://localhost:3000
# 5. Open browser
open http://localhost:3000
Development Workflow
Typical Development Session:
# Terminal 1: Backend
cd adi-o3-multipass
source venv/bin/activate
python -m server.app
# Terminal 2: Frontend
cd adi-o3-multipass/frontend
npm run dev
# Terminal 3: Logs
sudo journalctl -u brief-extractor -f
# or: tail -f server/processing.log
Hot Reload:
- Frontend: Vite HMR (instant updates on save)
- Backend: Manual restart required (no auto-reload in production mode)
- Development: Set
DEBUG=truefor auto-reload (not recommended for async code)
- Development: Set
Testing
Backend Unit Tests (Future):
# tests/test_job_manager.py
async def test_create_job():
manager = JobManager.get_instance()
job = await manager.create_job(
file_name="test.pdf",
file_size=1024,
file_data=b"...",
user_id="test-user"
)
assert job.phase == JobPhase.QUEUED
Frontend Testing:
# Type checking
npm run type-check
# Linting
npm run lint
# Unit tests (future)
npm run test
Integration Testing:
# Test full pipeline with sample document
python core/process_brief_enhanced.py examples/sample_brief.pdf \
--primary-models openai-gpt5,anthropic-sonnet4 \
--consolidation-model anthropic-opus4
Debugging Tips
Backend Debugging:
# Add breakpoint
import pdb; pdb.set_trace()
# Enhanced logging for specific job
logger = logging.getLogger(f"job.{job_id}")
logger.setLevel(logging.DEBUG)
# Inspect job state
job = await job_manager.get_job(job_id)
print(f"Phase: {job.phase}, Progress: {job.progress_pct}%")
print(f"Providers: {job.provider_updates}")
Frontend Debugging:
// Access store from console
import { useJobStore } from '@/store/jobStore'
const jobs = useJobStore.getState().jobs
console.log(jobs)
// WebSocket debugging
import { wsClient } from '@/services/websocket'
console.log(wsClient.isConnected())
console.log(wsClient.getConnectionState())
// Force reconnect
wsClient.forceReconnect()
Common Debug Scenarios:
Jobs not appearing in queue:
# Check WebSocket connection
# Frontend console: wsClient.getConnectionState()
# Backend logs: grep "WebSocket" /var/log/
# Check user isolation
# Backend: Verify user_id matches between job creation and WebSocket
# Log: "Created job ... (user: {user_id})"
# Log: "WebSocket authenticated ... for user: {user_id}"
Consolidation returning empty:
# Check debug files
ls /tmp/consolidation_debug_*.json
cat /tmp/consolidation_debug_20251007_174500.json
# Check OpenAI library version
pip show openai
# Should be >= 1.0.0 for responses.parse() support
Troubleshooting Guide
Common Issues and Resolutions
Issue: "Development Mode" banner shows in production
Symptoms:
- Login page shows yellow "Development Mode" banner
- Authentication bypassed
Root Cause: Backend DEV_MODE=true in .env
Resolution:
# Edit backend .env file
nano .env
# Change: DEV_MODE=false
# Restart backend
sudo systemctl restart brief-extractor
# Verify
curl https://ai-sandbox.oliver.solutions/brief-extractor-back/api/auth/config
# Should return: "devMode": false
Issue: WebSocket connect/disconnect loop
Symptoms:
- Frontend logs show rapid connect/disconnect
- Backend logs:
'str' object has no attribute 'value'
Root Cause: Job phase serialization issue when phase is string instead of enum
Resolution:
Ensure server/jobs/models.py has defensive phase handling:
def to_dict(self):
phase_value = self.phase.value if isinstance(self.phase, JobPhase) else self.phase
return {'phase': phase_value, ...}
Issue: Jobs don't appear in queue after upload
Symptoms:
- Upload succeeds
- Must refresh page to see job
Root Cause: WebSocket user ID mismatch (session ID vs real user ID)
Resolution:
- Ensure WebSocket authenticates with query parameter:
?token=<jwt> - Backend extracts user ID from token:
websocket.args.get('token') - WebSocket client registered with real user ID, not session ID
Issue: GPT-5 consolidation returns empty object
Symptoms:
- Backend logs:
Missing 'assets' key in consolidated response - Consolidation phase fails
Root Cause: Outdated OpenAI library
Resolution:
# Upgrade OpenAI library
pip install --upgrade openai
# Verify version
pip show openai
# Should be >= 1.0.0
# Restart backend
sudo systemctl restart brief-extractor
Issue: 404 errors for assets (index-.js, index-.css)
Symptoms:
- Browser console:
Loading module blocked - disallowed MIME type - Assets return HTML (404 page) instead of JS/CSS
Root Cause: Incorrect base path in Vite config
Resolution:
// frontend/vite.config.ts
export default defineConfig({
base: '/brief-extractor/', // Must match deployment path
...
})
// Rebuild
npm run build
// Verify built index.html
cat dist/index.html
// Should show: src="/brief-extractor/assets/index-*.js"
Issue: WebSocket 400 Bad Request
Symptoms:
- Backend logs:
GET /ws 1.1" 400 - WebSocket never establishes
Root Cause: Apache not configured for WebSocket upgrade
Resolution:
# Add to Apache config
ProxyPass /brief-extractor-back/ws ws://localhost:8002/ws
ProxyPass /brief-extractor-back/ http://localhost:8002/
ProxyPassReverse /brief-extractor-back/ http://localhost:8002/
# Enable required modules
sudo a2enmod proxy proxy_http proxy_wstunnel rewrite
# Reload Apache
sudo systemctl reload apache2
Issue: CORS errors in browser
Symptoms:
- Browser console:
CORS policy: No 'Access-Control-Allow-Origin' header - API calls fail
Root Cause: ALLOWED_ORIGINS misconfigured
Resolution:
# Edit backend .env
ALLOWED_ORIGINS=https://ai-sandbox.oliver.solutions
# NOT: https://ai-sandbox.oliver.solutions/brief-extractor
# (Don't include path, just domain)
# Restart backend
sudo systemctl restart brief-extractor
Issue: File upload fails with 413 Payload Too Large
Symptoms:
- Large files rejected
- Error: "File size exceeds XMB limit"
Root Cause: Upload size limit too small
Resolution:
# Backend .env
MAX_UPLOAD_SIZE_MB=200 # Increase if needed
# Also check Apache/Nginx limits
# Apache: LimitRequestBody 209715200 # 200MB in bytes
# Nginx: client_max_body_size 200M;
# Restart services
sudo systemctl restart brief-extractor apache2
Advanced Topics
Multi-Model Consolidation Algorithm
Normalization Phase:
# 1. Title Normalization
# Input: "1234 - Location A", "Store B - Hero Slider"
# Output: "Wholesale - Hero Slider (Campaign)"
# 2. Category Normalization
# Input: "Paid Social", "Social Media - Paid"
# Output: "Paid Social"
# 3. Specifications Normalization
# Input: ["1080 × 1080", "1080x1080 px"]
# Output: ["1080x1080"]
Deduplication Strategy:
# Build deduplication key
key = (
normalized_title,
normalized_category,
media,
tuple(sorted(technical_specifications)),
asset_type
)
# Merge assets with same key
merged_asset = {
**base_asset,
'technical_specifications': union(specs1, specs2),
'language_country_market': union(markets1, markets2),
'quantity': max(qty1, qty2)
}
Quality Enhancement:
# For each field, choose best value from all models
reference_material = longest([model1.ref, model2.ref, model3.ref])
creative_direction = most_detailed([model1.dir, model2.dir, model3.dir])
Custom Prompt Engineering
Prompt Structure:
prompts/
├── system_multi_perspective.txt # System message for analysis
├── multi_perspective_analysis.txt # User prompt template
├── consolidation_analysis.txt # Consolidation strategy
└── universal_schema.json # Output schema
Customization:
# Edit prompts
nano prompts/multi_perspective_analysis.txt
# Changes take effect immediately (loaded at runtime)
# No need to restart backend
Prompt Variables:
# multi_perspective_analysis.txt
# Uses: {doc_type}, {document_content}
# consolidation_analysis.txt
# Uses: {models_results}
Adding New AI Models
Step 1: Create Provider Class
# core/llm_service/new_provider.py
class NewProvider(BaseLLMProvider):
async def generate_response(self, messages, schema, **kwargs):
# Implementation
pass
Step 2: Register in Provider Manager
# core/llm_service/provider_manager.py
elif provider_name == 'newprovider':
return NewProvider(model_name=model_name)
Step 3: Add Configuration
# core/config.py
MODEL_MAPPINGS = {
'newprovider-model1': ('newprovider', 'model-1'),
}
PRICING = {
'newprovider-model1': {
'input': 1.00,
'output': 3.00
}
}
Step 4: Update Frontend
// server/jobs/manager.py
model_info_map = {
'newprovider-model1': ModelInfo(
key='newprovider-model1',
name='New Model',
provider='NewProvider',
...
)
}
Appendix
File Location Reference
Configuration Files:
- Backend config:
/adi-o3-multipass/.env - Frontend config:
/adi-o3-multipass/frontend/.env - Server config:
/adi-o3-multipass/server/config_runtime.py - Core config:
/adi-o3-multipass/core/config.py - Vite config:
/adi-o3-multipass/frontend/vite.config.ts
Prompt Templates:
- System prompt:
/adi-o3-multipass/prompts/system_multi_perspective.txt - Analysis prompt:
/adi-o3-multipass/prompts/multi_perspective_analysis.txt - Consolidation prompt:
/adi-o3-multipass/prompts/consolidation_analysis.txt - Schema:
/adi-o3-multipass/prompts/universal_schema.json
Data Directories (Production):
- Uploads:
/var/www/html/brief-extractor/backend/server/data/uploads/ - Outputs:
/var/www/html/brief-extractor/backend/server/data/outputs/ - Debug files:
/tmp/consolidation_debug_*.json,/tmp/openai_debug_*.txt
Logs:
- Systemd journal:
journalctl -u brief-extractor - Application logs: Structured JSON to stdout (captured by systemd)
Port Reference
| Service | Port | Protocol | Purpose |
|---|---|---|---|
| Frontend Dev Server | 3000 | HTTP | Local development |
| Backend Dev Server | 8000 | HTTP | Local development |
| Backend Production | 8002 | HTTP | Production (behind Apache) |
| Apache Web Server | 443 | HTTPS | Public-facing |
Key URLs
Production:
- Frontend:
https://ai-sandbox.oliver.solutions/brief-extractor/ - Backend API:
https://ai-sandbox.oliver.solutions/brief-extractor-back/api - Backend WS:
wss://ai-sandbox.oliver.solutions/brief-extractor-back/ws - Health Check:
https://ai-sandbox.oliver.solutions/brief-extractor-back/health
Development:
- Frontend:
http://localhost:3000 - Backend API:
http://localhost:8000/api - Backend WS:
ws://localhost:8000/ws - Health Check:
http://localhost:8000/health
External Service Dependencies
| Service | Purpose | API Key Variable | Endpoint |
|---|---|---|---|
| OpenAI | GPT-5 model access | OPENAI_API_KEY |
https://api.openai.com/v1/responses |
| Anthropic | Claude models | ANTHROPIC_API_KEY |
https://api.anthropic.com/v1/messages |
| Google AI | Gemini models | GOOGLE_API_KEY |
https://generativelanguage.googleapis.com |
| LlamaCloud | Document parsing | LLAMACLOUD_API_KEY |
https://api.cloud.llamaindex.ai |
| Microsoft Azure AD | Authentication | (MSAL config) | https://login.microsoftonline.com |
Version Compatibility
Minimum Versions:
- Python: 3.13+
- Node.js: 18+
- OpenAI library: 1.0.0+ (for responses API)
- Anthropic library: 0.67.0+ (for async client)
- Google GenAI: 0.4.0+ (for new SDK)
Browser Support:
- Chrome/Edge: 90+
- Firefox: 88+
- Safari: 14+
- WebSocket support required
Security Best Practices
Production Checklist
DEV_MODE=falsein backend.envSECURE_COOKIES=truein backend.envHTTPS_ONLY=truein backend.env- Strong
SESSION_SECRET(min 32 random chars) - ALLOWED_ORIGINS restricted to production domain
- Apache/Nginx configured with SSL certificates
- WebSocket proxy configured with SSL
- File upload size limits enforced
- CORS properly configured
- API keys rotated regularly
- Systemd service runs as restricted user (www-data)
- File permissions: uploads/outputs not world-readable
- Regular security updates (pip, npm)
- Monitoring and alerting configured
- Backup strategy for job data (if needed)
Data Privacy Considerations
User Data:
- User emails/names from Azure AD (PII)
- Uploaded documents may contain confidential business information
- Generated CSVs contain extracted marketing data
Retention Policy:
- Jobs auto-deleted after
FILE_RETENTION_HOURS(24h default) - Uploaded files and CSVs deleted with job
- Logs may contain user identifiers (consider log retention)
GDPR Compliance:
- User data isolated per
user_id - Automatic data deletion (24h retention)
- Right to deletion: DELETE /api/jobs/{id}
- Data portability: CSV download
- Audit trail: Structured logs with user_id
Change Log
Version 2.0 (October 2025)
New Features:
- Multi-tenant architecture with user isolation
- Microsoft Azure AD authentication (MSAL)
- Real-time WebSocket communication
- Enhanced logging with conditional verbosity
- Improved error handling and debug artifacts
Improvements:
- PKCE authentication flow (more secure)
- Parallel model execution (2-3x faster)
- Smart WebSocket reconnection
- Comprehensive API documentation
Bug Fixes:
- GPT-5 consolidation empty response (OpenAI library upgrade)
- WebSocket authentication with query parameters
- Job phase serialization (enum vs string handling)
- Frontend base path configuration for subpath deployment
Version 1.0 (September 2025)
- Initial release with multi-model analysis
- LlamaParser integration
- Basic web interface
- CLI support
Support and Contact
Documentation Updates: This document should be updated whenever:
- Architecture changes are made
- New features are added
- Configuration options change
- Deployment procedures are modified
Getting Help:
- Review CLAUDE.md for project overview
- Check troubleshooting guide for common issues
- Review backend logs:
journalctl -u brief-extractor - Review frontend console for client-side errors
- Check debug artifacts in
/tmp/for detailed diagnostics
Document End