contract-query/contracts_documentation.md

# Contract Analysis Tool v2.0 - Technical Documentation

## Table of Contents

1. [System Overview](#system-overview)
2. [Architecture](#architecture)
3. [Technology Stack](#technology-stack)
4. [Data Models](#data-models)
5. [API Documentation](#api-documentation)
6. [Authentication & Authorization](#authentication--authorization)
7. [Document Processing Pipeline](#document-processing-pipeline)
8. [RAG System & Chat Implementation](#rag-system--chat-implementation)
9. [User Flows](#user-flows)
10. [Frontend Structure](#frontend-structure)
11. [Backend Structure](#backend-structure)
12. [Database Schema](#database-schema)
13. [Deployment Architecture](#deployment-architecture)
14. [Security Features](#security-features)
15. [Performance Optimizations](#performance-optimizations)

## System Overview

The Contract Analysis Tool v2.0 is a production-ready Retrieval-Augmented Generation (RAG) application designed for intelligent contract analysis and document Q&A. The system enables organizations to upload, process, and query legal documents using natural language processing capabilities powered by OpenAI's GPT-4 and LlamaIndex.

### Key Features

- **Document Management**: Upload and organize legal documents into searchable indices
- **Intelligent Q&A**: Natural language querying with contextual responses
- **Role-Based Access Control**: Admin and user role management with index-level permissions
- **Real-time Processing**: Asynchronous document processing with progress tracking
- **Multi-format Support**: PDF, DOCX, DOC, TXT, CSV, JSON, HTML, MD, RTF
- **Vector Search**: ChromaDB-powered semantic search with embedding similarity
- **Chat Context**: Conversation continuity with 24-hour rolling context window
- **SSO Integration**: Azure Active Directory integration with local fallback
- **Admin Dashboard**: Comprehensive system monitoring and management tools

## Architecture

```mermaid
graph TB
    subgraph "Client Layer"
        UI[React Frontend]
        Mobile[Mobile Browser]
    end

    subgraph "API Gateway"
        Gateway[FastAPI Application]
        Auth[JWT Authentication]
        CORS[CORS Middleware]
    end

    subgraph "Business Logic"
        AuthSvc[Auth Service]
        DocSvc[Document Service]
        RAGSvc[RAG Service]
        ChatSvc[Chat Service]
        AdminSvc[Admin Service]
    end

    subgraph "Data Storage"
        MongoDB[(MongoDB)]
        Redis[(Redis Cache)]
        ChromaDB[(ChromaDB Vector Store)]
        FileSystem[File System Storage]
    end

    subgraph "External Services"
        OpenAI[OpenAI API]
        LlamaParse[LlamaParse API]
        AzureAD[Azure AD SSO]
    end

    UI --> Gateway
    Mobile --> Gateway
    Gateway --> Auth
    Gateway --> CORS

    Gateway --> AuthSvc
    Gateway --> DocSvc
    Gateway --> RAGSvc
    Gateway --> ChatSvc
    Gateway --> AdminSvc

    AuthSvc --> MongoDB
    AuthSvc --> AzureAD
    DocSvc --> MongoDB
    DocSvc --> FileSystem
    RAGSvc --> ChromaDB
    RAGSvc --> OpenAI
    ChatSvc --> MongoDB
    ChatSvc --> Redis
    AdminSvc --> MongoDB

    DocSvc --> LlamaParse
    RAGSvc --> LlamaParse
```

### System Architecture Principles

- **Microservices Approach**: Modular service architecture with clear separation of concerns
- **Async Processing**: Non-blocking operations for document processing and embedding generation
- **Caching Strategy**: Multi-layer caching with Redis for API responses and application state
- **Scalable Storage**: Hybrid storage approach combining structured (MongoDB), cache (Redis), and vector (ChromaDB) databases
- **Security-First**: JWT-based authentication with role-based access control and input validation

## Technology Stack

### Backend Technologies

```mermaid
graph LR
    subgraph "Core Framework"
        FastAPI[FastAPI 0.104+]
        Python[Python 3.11+]
        Pydantic[Pydantic v2]
    end

    subgraph "AI/ML Stack"
        LlamaIndex[LlamaIndex]
        OpenAI[OpenAI GPT-4]
        Embeddings[OpenAI Embeddings]
        LlamaParse[LlamaParse]
    end

    subgraph "Data Layer"
        MongoDB[MongoDB]
        Motor[Motor Async Driver]
        ChromaDB[ChromaDB]
        Redis[Redis]
    end

    subgraph "Authentication"
        JWT[JWT Tokens]
        MSAL[MSAL Azure AD]
        Passlib[Passlib Hashing]
    end
```

### Frontend Technologies

```mermaid
graph LR
    subgraph "Core Framework"
        React[React 18+]
        Vite[Vite Build Tool]
        JavaScript[JavaScript ES6+]
    end

    subgraph "UI/UX"
        TailwindCSS[Tailwind CSS]
        Headless[Headless UI]
        Heroicons[Hero Icons]
    end

    subgraph "State Management"
        Context[React Context]
        Hooks[React Hooks]
        LocalStorage[Local Storage]
    end

    subgraph "HTTP & Auth"
        Axios[Axios HTTP Client]
        MSALReact[@azure/msal-react]
        ReactRouter[React Router]
    end
```

## Data Models

### User Model

```mermaid
erDiagram
    User {
        ObjectId _id PK
        EmailStr email
        UserRole role "admin|user"
        boolean is_active
        AuthMethod auth_method "local|sso"
        string hashed_password "optional for SSO"
        string sso_provider
        string sso_user_id
        string sso_email
        string sso_name
        dict sso_attributes
        datetime last_sso_login
        list index_access "accessible index IDs"
        datetime created_at
        datetime updated_at
    }
```

### Document Model

```mermaid
erDiagram
    Document {
        ObjectId _id PK
        string filename
        string original_filename
        int file_size
        string content_type
        string index_id FK
        ObjectId uploaded_by FK
        string file_path
        string processing_status "pending|processing|completed|failed"
        dict metadata
        string parsed_text
        list text_chunks
        string embedding_status "pending|processing|completed|failed"
        int chunk_count
        list vector_ids
        dict contract_summary
        string summary_status "pending|processing|completed|failed"
        datetime summary_created_at
        datetime created_at
        datetime updated_at
    }
```

### Index Model

```mermaid
erDiagram
    Index {
        ObjectId _id PK
        string name
        string description
        string index_id "unique identifier"
        ObjectId created_by FK
        string status "active|inactive|deleted"
        int document_count
        dict settings
        string vector_store_path
        string embedding_model "text-embedding-3-small"
        int chunk_size "1000"
        int chunk_overlap "200"
        datetime created_at
        datetime updated_at
    }
```

### Chat Message Model

```mermaid
erDiagram
    ChatMessage {
        ObjectId _id PK
        ObjectId user_id FK
        string index_id FK
        string query
        string response
        dict debug_info
        float response_time
        boolean cached
        list sources
        string context_used
        boolean deleted_by_user
        datetime created_at
        datetime updated_at
    }
```

### Entity Relationships

```mermaid
erDiagram
    User ||--o{ Index : "creates"
    User ||--o{ Document : "uploads"
    User ||--o{ ChatMessage : "sends"
    Index ||--o{ Document : "contains"
    Index ||--o{ ChatMessage : "queries"

    User {
        ObjectId _id PK
        EmailStr email
        UserRole role
        list index_access
    }

    Index {
        ObjectId _id PK
        string index_id UK
        string name
        ObjectId created_by FK
    }

    Document {
        ObjectId _id PK
        string filename
        string index_id FK
        ObjectId uploaded_by FK
    }

    ChatMessage {
        ObjectId _id PK
        ObjectId user_id FK
        string index_id FK
        string query
        string response
    }
```

## API Documentation

### Authentication Endpoints

| Method | Endpoint | Description | Auth Required |
|--------|----------|-------------|---------------|
| POST | `/api/v1/auth/login` | Local user authentication | No |
| POST | `/api/v1/auth/register` | User registration | No |
| GET | `/api/v1/auth/me` | Get current user info | Yes |
| POST | `/api/v1/auth/refresh` | Refresh JWT token | Yes |
| POST | `/api/v1/auth/logout` | User logout | No |
| GET | `/api/v1/auth/sso/config` | Get SSO configuration | No |
| POST | `/api/v1/auth/sso/validate` | Validate SSO token | No |
| POST | `/api/v1/auth/login/local` | Backup admin login | No |
| POST | `/api/v1/auth/init-users` | Initialize default users | No |

### Document Management Endpoints

| Method | Endpoint | Description | Auth Required | Role |
|--------|----------|-------------|---------------|------|
| POST | `/api/v1/documents/upload` | Upload documents to index | Yes | User/Admin |
| GET | `/api/v1/documents/{index_id}` | List documents in index | Yes | User/Admin |

### Index Management Endpoints

| Method | Endpoint | Description | Auth Required | Role |
|--------|----------|-------------|---------------|------|
| POST | `/api/v1/indices/create` | Create new document index | Yes | User/Admin |
| GET | `/api/v1/indices/` | List user's accessible indices | Yes | User/Admin |

### Chat Endpoints

| Method | Endpoint | Description | Auth Required | Role |
|--------|----------|-------------|---------------|------|
| POST | `/api/v1/chat/query` | Natural language document query | Yes | User/Admin |

### Admin Endpoints

| Method | Endpoint | Description | Auth Required | Role |
|--------|----------|-------------|---------------|------|
| GET | `/api/v1/admin/stats` | System statistics | Yes | Admin |
| POST | `/api/v1/admin/documents/upload-single` | Upload single document | Yes | Admin |
| POST | `/api/v1/admin/documents/upload-multiple` | Upload multiple documents | Yes | Admin |
| GET | `/api/v1/admin/documents/{index_id}` | Get index documents | Yes | Admin |
| POST | `/api/v1/admin/documents/{document_id}/reprocess` | Reprocess document | Yes | Admin |
| DELETE | `/api/v1/admin/documents/{document_id}` | Delete document | Yes | Admin |
| GET | `/api/v1/admin/indices` | Get all indices | Yes | Admin |
| POST | `/api/v1/admin/indices/create` | Create new index | Yes | Admin |
| POST | `/api/v1/admin/chat/query` | Admin RAG query interface | Yes | Admin |

## Authentication & Authorization

```mermaid
sequenceDiagram
    participant User
    participant Frontend
    participant FastAPI
    participant MongoDB
    participant AzureAD

    Note over User,AzureAD: SSO Authentication Flow
    User->>Frontend: Access Application
    Frontend->>FastAPI: Check SSO Config
    FastAPI-->>Frontend: SSO Configuration
    Frontend->>AzureAD: Redirect to SSO Login
    AzureAD->>Frontend: SSO Token
    Frontend->>FastAPI: Validate SSO Token
    FastAPI->>AzureAD: Verify Token
    AzureAD-->>FastAPI: User Claims
    FastAPI->>MongoDB: Create/Update User
    FastAPI-->>Frontend: Internal JWT Token

    Note over User,AzureAD: Local Authentication Flow
    User->>Frontend: Local Login Form
    Frontend->>FastAPI: Email/Password
    FastAPI->>MongoDB: Verify Credentials
    MongoDB-->>FastAPI: User Data
    FastAPI-->>Frontend: JWT Token + User Info
```

### Authentication Methods

1. **Single Sign-On (SSO)**
   - Azure Active Directory integration
   - Automatic user provisioning
   - Role mapping from AD groups
   - Token validation and refresh

2. **Local Authentication**
   - Email/password authentication
   - Bcrypt password hashing
   - JWT token-based sessions
   - Backup admin access

### Authorization Levels

```mermaid
graph TD
    A[User Request] --> B{Authenticated?}
    B -->|No| C[Return 401 Unauthorized]
    B -->|Yes| D{Valid Role?}
    D -->|No| E[Return 403 Forbidden]
    D -->|Yes| F{Index Access?}
    F -->|No| G[Return 403 Forbidden]
    F -->|Yes| H[Process Request]

    subgraph "Role Hierarchy"
        I[Admin] --> J[Full System Access]
        K[User] --> L[Restricted Access]
    end
```

## Document Processing Pipeline

```mermaid
flowchart TD
    A[User Uploads Document] --> B[File Validation]
    B --> C{Valid File?}
    C -->|No| D[Return Error]
    C -->|Yes| E[Store File to Disk]
    E --> F[Create Document Record]
    F --> G[Update Status: Processing]

    G --> H[LlamaParse Processing]
    H --> I{Parse Success?}
    I -->|No| J[Update Status: Failed]
    I -->|Yes| K[Extract Text Content]
    K --> L[Text Chunking]
    L --> M[Generate Embeddings]
    M --> N[Store in ChromaDB]
    N --> O[Update Vector IDs]
    O --> P[Update Status: Completed]

    subgraph "Async Processing"
        H
        I
        K
        L
        M
        N
        O
        P
    end

    subgraph "Status Tracking"
        Q[pending] --> R[processing]
        R --> S[completed]
        R --> T[failed]
    end
```

### Document Processing States

1. **Upload Phase**
   - File validation (type, size, format)
   - Virus scanning (if configured)
   - File system storage
   - Database record creation

2. **Processing Phase**
   - LlamaParse API integration
   - Text extraction and cleaning
   - Content chunking strategy
   - Metadata extraction

3. **Embedding Phase**
   - OpenAI embedding generation
   - Vector storage in ChromaDB
   - Index organization
   - Completion status updates

### Supported File Formats

| Format | Extension | Processing Method | Max Size |
|--------|-----------|------------------|----------|
| PDF | .pdf | LlamaParse | 50MB |
| Word Document | .docx, .doc | LlamaParse | 50MB |
| Text | .txt | Direct parsing | 10MB |
| CSV | .csv | Structured parsing | 25MB |
| JSON | .json | Structured parsing | 25MB |
| HTML | .html, .htm | Content extraction | 10MB |
| Markdown | .md | Direct parsing | 10MB |
| RTF | .rtf | Text extraction | 25MB |

## RAG System & Chat Implementation

```mermaid
sequenceDiagram
    participant User
    participant ChatAPI
    participant ContextService
    participant RAGService
    participant ChromaDB
    participant OpenAI
    participant MongoDB

    User->>ChatAPI: Submit Query
    ChatAPI->>ContextService: Get Conversation Context
    ContextService->>MongoDB: Fetch Recent Messages
    MongoDB-->>ContextService: Last 10 Messages (24h)
    ContextService-->>ChatAPI: Context Summary

    ChatAPI->>RAGService: Process Query with Context
    RAGService->>ChromaDB: Vector Similarity Search
    ChromaDB-->>RAGService: Relevant Documents
    RAGService->>OpenAI: Generate Response
    OpenAI-->>RAGService: AI Response
    RAGService-->>ChatAPI: Response + Sources

    ChatAPI->>MongoDB: Store Chat Message
    ChatAPI-->>User: Response + Context Info
```

### Chat Context System

The chat system implements a sophisticated context management system that provides conversation continuity:

#### Context Window Management
- **Time Window**: 24-hour rolling window for context relevance
- **Message Limit**: Maximum 10 previous messages to prevent token overflow
- **Smart Selection**: Prioritizes recent and relevant messages for context

#### Context Generation Process
1. **Message Retrieval**: Fetch recent messages within time window
2. **Relevance Filtering**: Score messages based on query similarity
3. **Context Summarization**: Generate concise context summary
4. **Token Management**: Ensure context fits within model limits

#### Caching Strategy
```mermaid
graph TD
    A[User Query] --> B{Has Context?}
    B -->|No| C[Simple Query Cache]
    B -->|Yes| D[Dynamic Response]
    C --> E[Cache Hit?]
    E -->|Yes| F[Return Cached Response]
    E -->|No| G[Generate & Cache Response]
    D --> H[Generate Contextual Response]
    G --> I[Return Response]
    H --> I
```

### Vector Search Implementation

The RAG system uses ChromaDB for efficient vector similarity search:

#### Embedding Strategy
- **Model**: OpenAI `text-embedding-3-small` (1536 dimensions)
- **Chunk Size**: 1000 characters with 200 character overlap
- **Similarity Metric**: Cosine similarity with configurable top-k results

#### Query Processing
1. **Query Embedding**: Convert natural language query to vector
2. **Similarity Search**: Find most relevant document chunks
3. **Result Ranking**: Score and rank results by relevance
4. **Context Assembly**: Combine search results with conversation context

## User Flows

### User Registration & Login Flow

```mermaid
flowchart TD
    A[User Visits Application] --> B{SSO Enabled?}
    B -->|Yes| C[Show SSO Login Option]
    B -->|No| D[Show Local Login Form]

    C --> E[Redirect to Azure AD]
    E --> F[Azure Authentication]
    F --> G[Return with SSO Token]
    G --> H[Validate Token with Backend]
    H --> I[Create/Update User Record]
    I --> J[Generate Internal JWT]
    J --> K[Redirect to Dashboard]

    D --> L[Enter Email/Password]
    L --> M[Submit Credentials]
    M --> N[Backend Validation]
    N --> O{Valid Credentials?}
    O -->|No| P[Show Error Message]
    O -->|Yes| Q[Generate JWT Token]
    Q --> K

    P --> L
```

### Document Upload & Processing Flow

```mermaid
flowchart TD
    A[Select Index] --> B[Choose Files]
    B --> C[File Validation]
    C --> D{Files Valid?}
    D -->|No| E[Show Validation Errors]
    D -->|Yes| F[Upload Progress Bar]
    F --> G[Files Uploaded to Server]
    G --> H[Processing Started]
    H --> I[Real-time Status Updates]
    I --> J{Processing Complete?}
    J -->|No| K[Show Processing Status]
    J -->|Yes| L[Show Success Message]
    K --> I
    E --> B
```

### Chat Query Flow

```mermaid
flowchart TD
    A[User Enters Query] --> B[Check Index Status]
    B --> C{Index Ready?}
    C -->|No| D[Show Index Not Ready Message]
    C -->|Yes| E[Submit Query to Backend]
    E --> F[Show Loading Indicator]
    F --> G[Backend Processing]
    G --> H[Receive Response with Sources]
    H --> I[Display Response]
    I --> J[Show Source References]
    J --> K[Update Chat History]
    K --> L[Enable Follow-up Questions]
```

### Admin Management Flow

```mermaid
flowchart TD
    A[Admin Login] --> B[Access Admin Panel]
    B --> C[System Statistics Dashboard]
    C --> D[Choose Management Action]
    D --> E{Action Type?}
    E -->|User Management| F[View/Edit Users]
    E -->|Index Management| G[Create/Delete Indices]
    E -->|Document Management| H[Upload/Process/Delete Documents]
    E -->|System Monitoring| I[View System Health]

    F --> J[Update User Roles/Access]
    G --> K[Configure Index Settings]
    H --> L[Batch Operations]
    I --> M[Performance Metrics]
```

## Frontend Structure

### Component Architecture

```mermaid
graph TD
    A[App.jsx] --> B[Layout.jsx]
    B --> C[Header.jsx]
    B --> D[Sidebar.jsx]
    B --> E[Main Content Area]

    E --> F[HomePage.jsx]
    E --> G[Dashboard.jsx]
    E --> H[DocumentManager.jsx]
    E --> I[ChatInterface.jsx]
    E --> J[AdminPanel.jsx]

    subgraph "Authentication Components"
        K[LoginPage.jsx]
        L[LoginForm.jsx]
        M[ProtectedRoute.jsx]
        N[ActivityTracker.jsx]
    end

    subgraph "Document Components"
        O[DocumentUpload.jsx]
        P[DocumentSummary.jsx]
        Q[DocumentViewer.jsx]
    end

    subgraph "Chat Components"
        R[ChatInterface.jsx]
        S[CollapsibleSourceChunk.jsx]
    end

    subgraph "Admin Components"
        T[UserEditor.jsx]
        U[IndexManager.jsx]
        V[ProcessingControl.jsx]
        W[RAGInterface.jsx]
    end
```

### State Management

```mermaid
graph TD
    subgraph "React Context Providers"
        A[AuthContext] --> B[User State]
        A --> C[Authentication Methods]
        A --> D[Token Management]
    end

    subgraph "Local State Management"
        E[Component State] --> F[useState Hooks]
        E --> G[useEffect Hooks]
        E --> H[Custom Hooks]
    end

    subgraph "Persistent Storage"
        I[localStorage] --> J[JWT Tokens]
        I --> K[User Preferences]
        I --> L[Session Data]
    end

    B --> E
    C --> E
    D --> I
```

### Service Layer

The frontend implements a comprehensive service layer for API communication:

```typescript
// Service Architecture
interface APIService {
  authService: AuthenticationService;
  documentService: DocumentManagementService;
  indexService: IndexManagementService;
  chatService: ChatService;
  adminService: AdminService;
}
```

## Backend Structure

### FastAPI Application Structure

```mermaid
graph TD
    A[main.py] --> B[FastAPI Application]
    B --> C[Middleware Stack]
    C --> D[CORS Middleware]
    C --> E[Authentication Middleware]
    C --> F[Request Timing Middleware]

    B --> G[API Routers]
    G --> H[Authentication Routes]
    G --> I[Document Routes]
    G --> J[Index Routes]
    G --> K[Chat Routes]
    G --> L[Admin Routes]

    subgraph "Core Services"
        M[Config Management]
        N[Database Connections]
        O[Cache Management]
        P[Security Utilities]
    end

    subgraph "Business Logic"
        Q[Document Processor]
        R[RAG Service]
        S[Chat Context Service]
        T[SSO Service]
    end

    H --> M
    I --> Q
    J --> R
    K --> S
    L --> T
```

### Service Architecture

```mermaid
graph TD
    subgraph "API Layer"
        A[FastAPI Routes]
    end

    subgraph "Service Layer"
        B[Document Processor Service]
        C[RAG Service]
        D[Chat Context Service]
        E[SSO Service]
        F[Contract Summary Service]
    end

    subgraph "Core Layer"
        G[Authentication Core]
        H[Security Core]
        I[Cache Core]
        J[ChromaDB Client]
    end

    subgraph "Data Layer"
        K[MongoDB Models]
        L[Pydantic Schemas]
        M[Database Utilities]
    end

    A --> B
    A --> C
    A --> D
    A --> E
    A --> F

    B --> G
    C --> H
    D --> I
    E --> J

    G --> K
    H --> L
    I --> M
```

## Database Schema

### MongoDB Collections

```mermaid
erDiagram
    users {
        ObjectId _id PK
        string email UK
        string hashed_password
        string role
        boolean is_active
        string auth_method
        string sso_provider
        array index_access
        datetime created_at
        datetime updated_at
    }

    indices {
        ObjectId _id PK
        string index_id UK
        string name
        string description
        ObjectId created_by FK
        string status
        int document_count
        object settings
        datetime created_at
    }

    documents {
        ObjectId _id PK
        string filename
        string index_id FK
        ObjectId uploaded_by FK
        string processing_status
        string embedding_status
        array text_chunks
        int chunk_count
        array vector_ids
        datetime created_at
    }

    chat_messages {
        ObjectId _id PK
        ObjectId user_id FK
        string index_id FK
        string query
        string response
        object debug_info
        float response_time
        boolean cached
        array sources
        datetime created_at
    }

    users ||--o{ indices : "creates"
    users ||--o{ documents : "uploads"
    users ||--o{ chat_messages : "sends"
    indices ||--o{ documents : "contains"
```

### ChromaDB Collections

```mermaid
graph TD
    A[ChromaDB Database] --> B[Collection: index_{index_id}]
    B --> C[Document Vectors]
    C --> D[Vector Data]
    C --> E[Metadata]
    C --> F[Document IDs]

    E --> G[filename]
    E --> H[document_id]
    E --> I[chunk_index]
    E --> J[index_id]
    E --> K[upload_timestamp]
```

### Redis Cache Structure

```mermaid
graph TD
    A[Redis Cache] --> B[Chat Responses]
    A --> C[User Sessions]
    A --> D[Index Metadata]

    B --> E["chat:{index_id}:{query_hash}"]
    C --> F["session:{user_id}"]
    D --> G["index_meta:{index_id}"]

    E --> H[Cached Response + Sources]
    F --> I[User State + Preferences]
    G --> J[Index Statistics]
```

## Deployment Architecture

### Production Deployment

```mermaid
graph TD
    subgraph "Load Balancer"
        A[nginx/ALB]
    end

    subgraph "Application Tier"
        B[FastAPI Container 1]
        C[FastAPI Container 2]
        D[React Frontend]
    end

    subgraph "Data Tier"
        E[MongoDB Cluster]
        F[Redis Cluster]
        G[ChromaDB Persistent Volume]
        H[File Storage]
    end

    subgraph "External Services"
        I[OpenAI API]
        J[LlamaParse API]
        K[Azure AD]
    end

    A --> B
    A --> C
    A --> D

    B --> E
    B --> F
    B --> G
    B --> H
    C --> E
    C --> F
    C --> G
    C --> H

    B --> I
    B --> J
    B --> K
    C --> I
    C --> J
    C --> K
```

### Docker Deployment

```mermaid
graph TD
    A[docker-compose.yml] --> B[Frontend Container]
    A --> C[Backend Container]
    A --> D[MongoDB Container]
    A --> E[Redis Container]

    B --> F[nginx:alpine]
    C --> G[python:3.11]
    D --> H[mongo:latest]
    E --> I[redis:alpine]

    subgraph "Volumes"
        J[uploads_volume]
        K[indices_volume]
        L[mongo_data]
        M[redis_data]
    end

    C --> J
    C --> K
    D --> L
    E --> M
```

### Environment Configuration

```mermaid
graph TD
    A[Environment Variables] --> B[Database Config]
    A --> C[API Keys]
    A --> D[Security Settings]
    A --> E[Feature Flags]

    B --> F[MONGODB_URL]
    B --> G[REDIS_URL]

    C --> H[OPENAI_API_KEY]
    C --> I[LLAMAPARSE_API_KEY]

    D --> J[JWT_SECRET_KEY]
    D --> K[CORS_ORIGINS]

    E --> L[SSO_ENABLED]
    E --> M[CACHE_ENABLED]
    E --> N[DEBUG]
```

## Security Features

### Security Architecture

```mermaid
graph TD
    subgraph "Authentication Layer"
        A[JWT Tokens]
        B[Password Hashing]
        C[SSO Integration]
        D[Session Management]
    end

    subgraph "Authorization Layer"
        E[Role-Based Access]
        F[Index-Level Permissions]
        G[Admin Controls]
        H[User Restrictions]
    end

    subgraph "Data Security"
        I[Input Validation]
        J[SQL Injection Prevention]
        K[File Upload Validation]
        L[Data Encryption]
    end

    subgraph "Network Security"
        M[CORS Configuration]
        N[HTTPS Enforcement]
        O[Rate Limiting]
        P[API Security Headers]
    end

    A --> E
    B --> F
    C --> G
    D --> H

    E --> I
    F --> J
    G --> K
    H --> L

    I --> M
    J --> N
    K --> O
    L --> P
```

### Security Measures

1. **Authentication Security**
   - JWT tokens with configurable expiration
   - Bcrypt password hashing with salt rounds
   - Azure AD integration with token validation
   - Automatic session cleanup

2. **Authorization Controls**
   - Role-based access control (Admin/User)
   - Index-level access permissions
   - Protected route implementation
   - Resource-level authorization checks

3. **Input Validation & Sanitization**
   - Pydantic schema validation
   - File type and size restrictions
   - SQL injection prevention through ODM
   - XSS protection in frontend

4. **Data Protection**
   - Encrypted password storage
   - Secure token transmission
   - Private document storage
   - Audit logging for admin actions

## Performance Optimizations

### Caching Strategy

```mermaid
graph TD
    A[Client Request] --> B{Cache Layer 1}
    B -->|Hit| C[Return Cached Response]
    B -->|Miss| D{Cache Layer 2}
    D -->|Hit| E[Return Database Cache]
    D -->|Miss| F[Process Request]
    F --> G[Update All Caches]
    G --> H[Return Response]

    subgraph "Cache Layers"
        I[Browser Cache]
        J[Redis Application Cache]
        K[Database Query Cache]
        L[Vector Search Cache]
    end
```

### Database Optimizations

1. **MongoDB Indexing Strategy**
   - Compound indexes on frequently queried fields
   - Text indexes for search functionality
   - TTL indexes for automatic cleanup
   - Index monitoring and optimization

2. **Query Optimization**
   - Aggregation pipeline optimization
   - Projection to reduce data transfer
   - Pagination for large result sets
   - Connection pooling for efficiency

3. **Vector Store Optimization**
   - Batch embedding generation
   - Optimized chunk sizes for retrieval
   - Index compression for storage efficiency
   - Similarity search optimization

### Frontend Performance

1. **Code Splitting**
   - Route-based code splitting
   - Lazy loading of components
   - Dynamic imports for optimization
   - Bundle size analysis

2. **Caching & Storage**
   - Service worker caching
   - Local storage optimization
   - API response caching
   - Static asset caching

3. **Rendering Optimization**
   - React.memo for expensive components
   - useCallback for function optimization
   - Virtual scrolling for large lists
   - Debounced search inputs

### Backend Performance

1. **Async Processing**
   - Non-blocking I/O operations
   - Background task processing
   - Queue-based document processing
   - Concurrent request handling

2. **Memory Management**
   - Efficient object lifecycle management
   - Memory pool optimization
   - Garbage collection tuning
   - Resource cleanup automation

3. **API Optimization**
   - Response compression
   - Pagination implementation
   - Field selection for responses
   - Request/response caching

---

## Conclusion

The Contract Analysis Tool v2.0 represents a comprehensive, production-ready solution for intelligent document analysis and querying. The architecture emphasizes scalability, security, and performance while maintaining ease of use and deployment flexibility.

Key architectural strengths:
- **Modular Design**: Clear separation of concerns with microservices approach
- **Scalable Storage**: Hybrid database architecture optimized for different data types
- **Security-First**: Comprehensive authentication and authorization implementation
- **Performance-Optimized**: Multi-layer caching and async processing
- **Developer-Friendly**: Well-structured codebase with comprehensive documentation

The system is designed to handle enterprise-scale document processing workloads while providing an intuitive user experience for both administrators and end users.