1183 lines
No EOL
30 KiB
Markdown
1183 lines
No EOL
30 KiB
Markdown
# Contract Analysis Tool v2.0 - Technical Documentation
|
|
|
|
## Table of Contents
|
|
|
|
1. [System Overview](#system-overview)
|
|
2. [Architecture](#architecture)
|
|
3. [Technology Stack](#technology-stack)
|
|
4. [Data Models](#data-models)
|
|
5. [API Documentation](#api-documentation)
|
|
6. [Authentication & Authorization](#authentication--authorization)
|
|
7. [Document Processing Pipeline](#document-processing-pipeline)
|
|
8. [RAG System & Chat Implementation](#rag-system--chat-implementation)
|
|
9. [User Flows](#user-flows)
|
|
10. [Frontend Structure](#frontend-structure)
|
|
11. [Backend Structure](#backend-structure)
|
|
12. [Database Schema](#database-schema)
|
|
13. [Deployment Architecture](#deployment-architecture)
|
|
14. [Security Features](#security-features)
|
|
15. [Performance Optimizations](#performance-optimizations)
|
|
|
|
## System Overview
|
|
|
|
The Contract Analysis Tool v2.0 is a production-ready Retrieval-Augmented Generation (RAG) application designed for intelligent contract analysis and document Q&A. The system enables organizations to upload, process, and query legal documents using natural language processing capabilities powered by OpenAI's GPT-4 and LlamaIndex.
|
|
|
|
### Key Features
|
|
|
|
- **Document Management**: Upload and organize legal documents into searchable indices
|
|
- **Intelligent Q&A**: Natural language querying with contextual responses
|
|
- **Role-Based Access Control**: Admin and user role management with index-level permissions
|
|
- **Real-time Processing**: Asynchronous document processing with progress tracking
|
|
- **Multi-format Support**: PDF, DOCX, DOC, TXT, CSV, JSON, HTML, MD, RTF
|
|
- **Vector Search**: ChromaDB-powered semantic search with embedding similarity
|
|
- **Chat Context**: Conversation continuity with 24-hour rolling context window
|
|
- **SSO Integration**: Azure Active Directory integration with local fallback
|
|
- **Admin Dashboard**: Comprehensive system monitoring and management tools
|
|
|
|
## Architecture
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "Client Layer"
|
|
UI[React Frontend]
|
|
Mobile[Mobile Browser]
|
|
end
|
|
|
|
subgraph "API Gateway"
|
|
Gateway[FastAPI Application]
|
|
Auth[JWT Authentication]
|
|
CORS[CORS Middleware]
|
|
end
|
|
|
|
subgraph "Business Logic"
|
|
AuthSvc[Auth Service]
|
|
DocSvc[Document Service]
|
|
RAGSvc[RAG Service]
|
|
ChatSvc[Chat Service]
|
|
AdminSvc[Admin Service]
|
|
end
|
|
|
|
subgraph "Data Storage"
|
|
MongoDB[(MongoDB)]
|
|
Redis[(Redis Cache)]
|
|
ChromaDB[(ChromaDB Vector Store)]
|
|
FileSystem[File System Storage]
|
|
end
|
|
|
|
subgraph "External Services"
|
|
OpenAI[OpenAI API]
|
|
LlamaParse[LlamaParse API]
|
|
AzureAD[Azure AD SSO]
|
|
end
|
|
|
|
UI --> Gateway
|
|
Mobile --> Gateway
|
|
Gateway --> Auth
|
|
Gateway --> CORS
|
|
|
|
Gateway --> AuthSvc
|
|
Gateway --> DocSvc
|
|
Gateway --> RAGSvc
|
|
Gateway --> ChatSvc
|
|
Gateway --> AdminSvc
|
|
|
|
AuthSvc --> MongoDB
|
|
AuthSvc --> AzureAD
|
|
DocSvc --> MongoDB
|
|
DocSvc --> FileSystem
|
|
RAGSvc --> ChromaDB
|
|
RAGSvc --> OpenAI
|
|
ChatSvc --> MongoDB
|
|
ChatSvc --> Redis
|
|
AdminSvc --> MongoDB
|
|
|
|
DocSvc --> LlamaParse
|
|
RAGSvc --> LlamaParse
|
|
```
|
|
|
|
### System Architecture Principles
|
|
|
|
- **Microservices Approach**: Modular service architecture with clear separation of concerns
|
|
- **Async Processing**: Non-blocking operations for document processing and embedding generation
|
|
- **Caching Strategy**: Multi-layer caching with Redis for API responses and application state
|
|
- **Scalable Storage**: Hybrid storage approach combining structured (MongoDB), cache (Redis), and vector (ChromaDB) databases
|
|
- **Security-First**: JWT-based authentication with role-based access control and input validation
|
|
|
|
## Technology Stack
|
|
|
|
### Backend Technologies
|
|
|
|
```mermaid
|
|
graph LR
|
|
subgraph "Core Framework"
|
|
FastAPI[FastAPI 0.104+]
|
|
Python[Python 3.11+]
|
|
Pydantic[Pydantic v2]
|
|
end
|
|
|
|
subgraph "AI/ML Stack"
|
|
LlamaIndex[LlamaIndex]
|
|
OpenAI[OpenAI GPT-4]
|
|
Embeddings[OpenAI Embeddings]
|
|
LlamaParse[LlamaParse]
|
|
end
|
|
|
|
subgraph "Data Layer"
|
|
MongoDB[MongoDB]
|
|
Motor[Motor Async Driver]
|
|
ChromaDB[ChromaDB]
|
|
Redis[Redis]
|
|
end
|
|
|
|
subgraph "Authentication"
|
|
JWT[JWT Tokens]
|
|
MSAL[MSAL Azure AD]
|
|
Passlib[Passlib Hashing]
|
|
end
|
|
```
|
|
|
|
### Frontend Technologies
|
|
|
|
```mermaid
|
|
graph LR
|
|
subgraph "Core Framework"
|
|
React[React 18+]
|
|
Vite[Vite Build Tool]
|
|
JavaScript[JavaScript ES6+]
|
|
end
|
|
|
|
subgraph "UI/UX"
|
|
TailwindCSS[Tailwind CSS]
|
|
Headless[Headless UI]
|
|
Heroicons[Hero Icons]
|
|
end
|
|
|
|
subgraph "State Management"
|
|
Context[React Context]
|
|
Hooks[React Hooks]
|
|
LocalStorage[Local Storage]
|
|
end
|
|
|
|
subgraph "HTTP & Auth"
|
|
Axios[Axios HTTP Client]
|
|
MSALReact[@azure/msal-react]
|
|
ReactRouter[React Router]
|
|
end
|
|
```
|
|
|
|
## Data Models
|
|
|
|
### User Model
|
|
|
|
```mermaid
|
|
erDiagram
|
|
User {
|
|
ObjectId _id PK
|
|
EmailStr email
|
|
UserRole role "admin|user"
|
|
boolean is_active
|
|
AuthMethod auth_method "local|sso"
|
|
string hashed_password "optional for SSO"
|
|
string sso_provider
|
|
string sso_user_id
|
|
string sso_email
|
|
string sso_name
|
|
dict sso_attributes
|
|
datetime last_sso_login
|
|
list index_access "accessible index IDs"
|
|
datetime created_at
|
|
datetime updated_at
|
|
}
|
|
```
|
|
|
|
### Document Model
|
|
|
|
```mermaid
|
|
erDiagram
|
|
Document {
|
|
ObjectId _id PK
|
|
string filename
|
|
string original_filename
|
|
int file_size
|
|
string content_type
|
|
string index_id FK
|
|
ObjectId uploaded_by FK
|
|
string file_path
|
|
string processing_status "pending|processing|completed|failed"
|
|
dict metadata
|
|
string parsed_text
|
|
list text_chunks
|
|
string embedding_status "pending|processing|completed|failed"
|
|
int chunk_count
|
|
list vector_ids
|
|
dict contract_summary
|
|
string summary_status "pending|processing|completed|failed"
|
|
datetime summary_created_at
|
|
datetime created_at
|
|
datetime updated_at
|
|
}
|
|
```
|
|
|
|
### Index Model
|
|
|
|
```mermaid
|
|
erDiagram
|
|
Index {
|
|
ObjectId _id PK
|
|
string name
|
|
string description
|
|
string index_id "unique identifier"
|
|
ObjectId created_by FK
|
|
string status "active|inactive|deleted"
|
|
int document_count
|
|
dict settings
|
|
string vector_store_path
|
|
string embedding_model "text-embedding-3-small"
|
|
int chunk_size "1000"
|
|
int chunk_overlap "200"
|
|
datetime created_at
|
|
datetime updated_at
|
|
}
|
|
```
|
|
|
|
### Chat Message Model
|
|
|
|
```mermaid
|
|
erDiagram
|
|
ChatMessage {
|
|
ObjectId _id PK
|
|
ObjectId user_id FK
|
|
string index_id FK
|
|
string query
|
|
string response
|
|
dict debug_info
|
|
float response_time
|
|
boolean cached
|
|
list sources
|
|
string context_used
|
|
boolean deleted_by_user
|
|
datetime created_at
|
|
datetime updated_at
|
|
}
|
|
```
|
|
|
|
### Entity Relationships
|
|
|
|
```mermaid
|
|
erDiagram
|
|
User ||--o{ Index : "creates"
|
|
User ||--o{ Document : "uploads"
|
|
User ||--o{ ChatMessage : "sends"
|
|
Index ||--o{ Document : "contains"
|
|
Index ||--o{ ChatMessage : "queries"
|
|
|
|
User {
|
|
ObjectId _id PK
|
|
EmailStr email
|
|
UserRole role
|
|
list index_access
|
|
}
|
|
|
|
Index {
|
|
ObjectId _id PK
|
|
string index_id UK
|
|
string name
|
|
ObjectId created_by FK
|
|
}
|
|
|
|
Document {
|
|
ObjectId _id PK
|
|
string filename
|
|
string index_id FK
|
|
ObjectId uploaded_by FK
|
|
}
|
|
|
|
ChatMessage {
|
|
ObjectId _id PK
|
|
ObjectId user_id FK
|
|
string index_id FK
|
|
string query
|
|
string response
|
|
}
|
|
```
|
|
|
|
## API Documentation
|
|
|
|
### Authentication Endpoints
|
|
|
|
| Method | Endpoint | Description | Auth Required |
|
|
|--------|----------|-------------|---------------|
|
|
| POST | `/api/v1/auth/login` | Local user authentication | No |
|
|
| POST | `/api/v1/auth/register` | User registration | No |
|
|
| GET | `/api/v1/auth/me` | Get current user info | Yes |
|
|
| POST | `/api/v1/auth/refresh` | Refresh JWT token | Yes |
|
|
| POST | `/api/v1/auth/logout` | User logout | No |
|
|
| GET | `/api/v1/auth/sso/config` | Get SSO configuration | No |
|
|
| POST | `/api/v1/auth/sso/validate` | Validate SSO token | No |
|
|
| POST | `/api/v1/auth/login/local` | Backup admin login | No |
|
|
| POST | `/api/v1/auth/init-users` | Initialize default users | No |
|
|
|
|
### Document Management Endpoints
|
|
|
|
| Method | Endpoint | Description | Auth Required | Role |
|
|
|--------|----------|-------------|---------------|------|
|
|
| POST | `/api/v1/documents/upload` | Upload documents to index | Yes | User/Admin |
|
|
| GET | `/api/v1/documents/{index_id}` | List documents in index | Yes | User/Admin |
|
|
|
|
### Index Management Endpoints
|
|
|
|
| Method | Endpoint | Description | Auth Required | Role |
|
|
|--------|----------|-------------|---------------|------|
|
|
| POST | `/api/v1/indices/create` | Create new document index | Yes | User/Admin |
|
|
| GET | `/api/v1/indices/` | List user's accessible indices | Yes | User/Admin |
|
|
|
|
### Chat Endpoints
|
|
|
|
| Method | Endpoint | Description | Auth Required | Role |
|
|
|--------|----------|-------------|---------------|------|
|
|
| POST | `/api/v1/chat/query` | Natural language document query | Yes | User/Admin |
|
|
|
|
### Admin Endpoints
|
|
|
|
| Method | Endpoint | Description | Auth Required | Role |
|
|
|--------|----------|-------------|---------------|------|
|
|
| GET | `/api/v1/admin/stats` | System statistics | Yes | Admin |
|
|
| POST | `/api/v1/admin/documents/upload-single` | Upload single document | Yes | Admin |
|
|
| POST | `/api/v1/admin/documents/upload-multiple` | Upload multiple documents | Yes | Admin |
|
|
| GET | `/api/v1/admin/documents/{index_id}` | Get index documents | Yes | Admin |
|
|
| POST | `/api/v1/admin/documents/{document_id}/reprocess` | Reprocess document | Yes | Admin |
|
|
| DELETE | `/api/v1/admin/documents/{document_id}` | Delete document | Yes | Admin |
|
|
| GET | `/api/v1/admin/indices` | Get all indices | Yes | Admin |
|
|
| POST | `/api/v1/admin/indices/create` | Create new index | Yes | Admin |
|
|
| POST | `/api/v1/admin/chat/query` | Admin RAG query interface | Yes | Admin |
|
|
|
|
## Authentication & Authorization
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant User
|
|
participant Frontend
|
|
participant FastAPI
|
|
participant MongoDB
|
|
participant AzureAD
|
|
|
|
Note over User,AzureAD: SSO Authentication Flow
|
|
User->>Frontend: Access Application
|
|
Frontend->>FastAPI: Check SSO Config
|
|
FastAPI-->>Frontend: SSO Configuration
|
|
Frontend->>AzureAD: Redirect to SSO Login
|
|
AzureAD->>Frontend: SSO Token
|
|
Frontend->>FastAPI: Validate SSO Token
|
|
FastAPI->>AzureAD: Verify Token
|
|
AzureAD-->>FastAPI: User Claims
|
|
FastAPI->>MongoDB: Create/Update User
|
|
FastAPI-->>Frontend: Internal JWT Token
|
|
|
|
Note over User,AzureAD: Local Authentication Flow
|
|
User->>Frontend: Local Login Form
|
|
Frontend->>FastAPI: Email/Password
|
|
FastAPI->>MongoDB: Verify Credentials
|
|
MongoDB-->>FastAPI: User Data
|
|
FastAPI-->>Frontend: JWT Token + User Info
|
|
```
|
|
|
|
### Authentication Methods
|
|
|
|
1. **Single Sign-On (SSO)**
|
|
- Azure Active Directory integration
|
|
- Automatic user provisioning
|
|
- Role mapping from AD groups
|
|
- Token validation and refresh
|
|
|
|
2. **Local Authentication**
|
|
- Email/password authentication
|
|
- Bcrypt password hashing
|
|
- JWT token-based sessions
|
|
- Backup admin access
|
|
|
|
### Authorization Levels
|
|
|
|
```mermaid
|
|
graph TD
|
|
A[User Request] --> B{Authenticated?}
|
|
B -->|No| C[Return 401 Unauthorized]
|
|
B -->|Yes| D{Valid Role?}
|
|
D -->|No| E[Return 403 Forbidden]
|
|
D -->|Yes| F{Index Access?}
|
|
F -->|No| G[Return 403 Forbidden]
|
|
F -->|Yes| H[Process Request]
|
|
|
|
subgraph "Role Hierarchy"
|
|
I[Admin] --> J[Full System Access]
|
|
K[User] --> L[Restricted Access]
|
|
end
|
|
```
|
|
|
|
## Document Processing Pipeline
|
|
|
|
```mermaid
|
|
flowchart TD
|
|
A[User Uploads Document] --> B[File Validation]
|
|
B --> C{Valid File?}
|
|
C -->|No| D[Return Error]
|
|
C -->|Yes| E[Store File to Disk]
|
|
E --> F[Create Document Record]
|
|
F --> G[Update Status: Processing]
|
|
|
|
G --> H[LlamaParse Processing]
|
|
H --> I{Parse Success?}
|
|
I -->|No| J[Update Status: Failed]
|
|
I -->|Yes| K[Extract Text Content]
|
|
K --> L[Text Chunking]
|
|
L --> M[Generate Embeddings]
|
|
M --> N[Store in ChromaDB]
|
|
N --> O[Update Vector IDs]
|
|
O --> P[Update Status: Completed]
|
|
|
|
subgraph "Async Processing"
|
|
H
|
|
I
|
|
K
|
|
L
|
|
M
|
|
N
|
|
O
|
|
P
|
|
end
|
|
|
|
subgraph "Status Tracking"
|
|
Q[pending] --> R[processing]
|
|
R --> S[completed]
|
|
R --> T[failed]
|
|
end
|
|
```
|
|
|
|
### Document Processing States
|
|
|
|
1. **Upload Phase**
|
|
- File validation (type, size, format)
|
|
- Virus scanning (if configured)
|
|
- File system storage
|
|
- Database record creation
|
|
|
|
2. **Processing Phase**
|
|
- LlamaParse API integration
|
|
- Text extraction and cleaning
|
|
- Content chunking strategy
|
|
- Metadata extraction
|
|
|
|
3. **Embedding Phase**
|
|
- OpenAI embedding generation
|
|
- Vector storage in ChromaDB
|
|
- Index organization
|
|
- Completion status updates
|
|
|
|
### Supported File Formats
|
|
|
|
| Format | Extension | Processing Method | Max Size |
|
|
|--------|-----------|------------------|----------|
|
|
| PDF | .pdf | LlamaParse | 50MB |
|
|
| Word Document | .docx, .doc | LlamaParse | 50MB |
|
|
| Text | .txt | Direct parsing | 10MB |
|
|
| CSV | .csv | Structured parsing | 25MB |
|
|
| JSON | .json | Structured parsing | 25MB |
|
|
| HTML | .html, .htm | Content extraction | 10MB |
|
|
| Markdown | .md | Direct parsing | 10MB |
|
|
| RTF | .rtf | Text extraction | 25MB |
|
|
|
|
## RAG System & Chat Implementation
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant User
|
|
participant ChatAPI
|
|
participant ContextService
|
|
participant RAGService
|
|
participant ChromaDB
|
|
participant OpenAI
|
|
participant MongoDB
|
|
|
|
User->>ChatAPI: Submit Query
|
|
ChatAPI->>ContextService: Get Conversation Context
|
|
ContextService->>MongoDB: Fetch Recent Messages
|
|
MongoDB-->>ContextService: Last 10 Messages (24h)
|
|
ContextService-->>ChatAPI: Context Summary
|
|
|
|
ChatAPI->>RAGService: Process Query with Context
|
|
RAGService->>ChromaDB: Vector Similarity Search
|
|
ChromaDB-->>RAGService: Relevant Documents
|
|
RAGService->>OpenAI: Generate Response
|
|
OpenAI-->>RAGService: AI Response
|
|
RAGService-->>ChatAPI: Response + Sources
|
|
|
|
ChatAPI->>MongoDB: Store Chat Message
|
|
ChatAPI-->>User: Response + Context Info
|
|
```
|
|
|
|
### Chat Context System
|
|
|
|
The chat system implements a sophisticated context management system that provides conversation continuity:
|
|
|
|
#### Context Window Management
|
|
- **Time Window**: 24-hour rolling window for context relevance
|
|
- **Message Limit**: Maximum 10 previous messages to prevent token overflow
|
|
- **Smart Selection**: Prioritizes recent and relevant messages for context
|
|
|
|
#### Context Generation Process
|
|
1. **Message Retrieval**: Fetch recent messages within time window
|
|
2. **Relevance Filtering**: Score messages based on query similarity
|
|
3. **Context Summarization**: Generate concise context summary
|
|
4. **Token Management**: Ensure context fits within model limits
|
|
|
|
#### Caching Strategy
|
|
```mermaid
|
|
graph TD
|
|
A[User Query] --> B{Has Context?}
|
|
B -->|No| C[Simple Query Cache]
|
|
B -->|Yes| D[Dynamic Response]
|
|
C --> E[Cache Hit?]
|
|
E -->|Yes| F[Return Cached Response]
|
|
E -->|No| G[Generate & Cache Response]
|
|
D --> H[Generate Contextual Response]
|
|
G --> I[Return Response]
|
|
H --> I
|
|
```
|
|
|
|
### Vector Search Implementation
|
|
|
|
The RAG system uses ChromaDB for efficient vector similarity search:
|
|
|
|
#### Embedding Strategy
|
|
- **Model**: OpenAI `text-embedding-3-small` (1536 dimensions)
|
|
- **Chunk Size**: 1000 characters with 200 character overlap
|
|
- **Similarity Metric**: Cosine similarity with configurable top-k results
|
|
|
|
#### Query Processing
|
|
1. **Query Embedding**: Convert natural language query to vector
|
|
2. **Similarity Search**: Find most relevant document chunks
|
|
3. **Result Ranking**: Score and rank results by relevance
|
|
4. **Context Assembly**: Combine search results with conversation context
|
|
|
|
## User Flows
|
|
|
|
### User Registration & Login Flow
|
|
|
|
```mermaid
|
|
flowchart TD
|
|
A[User Visits Application] --> B{SSO Enabled?}
|
|
B -->|Yes| C[Show SSO Login Option]
|
|
B -->|No| D[Show Local Login Form]
|
|
|
|
C --> E[Redirect to Azure AD]
|
|
E --> F[Azure Authentication]
|
|
F --> G[Return with SSO Token]
|
|
G --> H[Validate Token with Backend]
|
|
H --> I[Create/Update User Record]
|
|
I --> J[Generate Internal JWT]
|
|
J --> K[Redirect to Dashboard]
|
|
|
|
D --> L[Enter Email/Password]
|
|
L --> M[Submit Credentials]
|
|
M --> N[Backend Validation]
|
|
N --> O{Valid Credentials?}
|
|
O -->|No| P[Show Error Message]
|
|
O -->|Yes| Q[Generate JWT Token]
|
|
Q --> K
|
|
|
|
P --> L
|
|
```
|
|
|
|
### Document Upload & Processing Flow
|
|
|
|
```mermaid
|
|
flowchart TD
|
|
A[Select Index] --> B[Choose Files]
|
|
B --> C[File Validation]
|
|
C --> D{Files Valid?}
|
|
D -->|No| E[Show Validation Errors]
|
|
D -->|Yes| F[Upload Progress Bar]
|
|
F --> G[Files Uploaded to Server]
|
|
G --> H[Processing Started]
|
|
H --> I[Real-time Status Updates]
|
|
I --> J{Processing Complete?}
|
|
J -->|No| K[Show Processing Status]
|
|
J -->|Yes| L[Show Success Message]
|
|
K --> I
|
|
E --> B
|
|
```
|
|
|
|
### Chat Query Flow
|
|
|
|
```mermaid
|
|
flowchart TD
|
|
A[User Enters Query] --> B[Check Index Status]
|
|
B --> C{Index Ready?}
|
|
C -->|No| D[Show Index Not Ready Message]
|
|
C -->|Yes| E[Submit Query to Backend]
|
|
E --> F[Show Loading Indicator]
|
|
F --> G[Backend Processing]
|
|
G --> H[Receive Response with Sources]
|
|
H --> I[Display Response]
|
|
I --> J[Show Source References]
|
|
J --> K[Update Chat History]
|
|
K --> L[Enable Follow-up Questions]
|
|
```
|
|
|
|
### Admin Management Flow
|
|
|
|
```mermaid
|
|
flowchart TD
|
|
A[Admin Login] --> B[Access Admin Panel]
|
|
B --> C[System Statistics Dashboard]
|
|
C --> D[Choose Management Action]
|
|
D --> E{Action Type?}
|
|
E -->|User Management| F[View/Edit Users]
|
|
E -->|Index Management| G[Create/Delete Indices]
|
|
E -->|Document Management| H[Upload/Process/Delete Documents]
|
|
E -->|System Monitoring| I[View System Health]
|
|
|
|
F --> J[Update User Roles/Access]
|
|
G --> K[Configure Index Settings]
|
|
H --> L[Batch Operations]
|
|
I --> M[Performance Metrics]
|
|
```
|
|
|
|
## Frontend Structure
|
|
|
|
### Component Architecture
|
|
|
|
```mermaid
|
|
graph TD
|
|
A[App.jsx] --> B[Layout.jsx]
|
|
B --> C[Header.jsx]
|
|
B --> D[Sidebar.jsx]
|
|
B --> E[Main Content Area]
|
|
|
|
E --> F[HomePage.jsx]
|
|
E --> G[Dashboard.jsx]
|
|
E --> H[DocumentManager.jsx]
|
|
E --> I[ChatInterface.jsx]
|
|
E --> J[AdminPanel.jsx]
|
|
|
|
subgraph "Authentication Components"
|
|
K[LoginPage.jsx]
|
|
L[LoginForm.jsx]
|
|
M[ProtectedRoute.jsx]
|
|
N[ActivityTracker.jsx]
|
|
end
|
|
|
|
subgraph "Document Components"
|
|
O[DocumentUpload.jsx]
|
|
P[DocumentSummary.jsx]
|
|
Q[DocumentViewer.jsx]
|
|
end
|
|
|
|
subgraph "Chat Components"
|
|
R[ChatInterface.jsx]
|
|
S[CollapsibleSourceChunk.jsx]
|
|
end
|
|
|
|
subgraph "Admin Components"
|
|
T[UserEditor.jsx]
|
|
U[IndexManager.jsx]
|
|
V[ProcessingControl.jsx]
|
|
W[RAGInterface.jsx]
|
|
end
|
|
```
|
|
|
|
### State Management
|
|
|
|
```mermaid
|
|
graph TD
|
|
subgraph "React Context Providers"
|
|
A[AuthContext] --> B[User State]
|
|
A --> C[Authentication Methods]
|
|
A --> D[Token Management]
|
|
end
|
|
|
|
subgraph "Local State Management"
|
|
E[Component State] --> F[useState Hooks]
|
|
E --> G[useEffect Hooks]
|
|
E --> H[Custom Hooks]
|
|
end
|
|
|
|
subgraph "Persistent Storage"
|
|
I[localStorage] --> J[JWT Tokens]
|
|
I --> K[User Preferences]
|
|
I --> L[Session Data]
|
|
end
|
|
|
|
B --> E
|
|
C --> E
|
|
D --> I
|
|
```
|
|
|
|
### Service Layer
|
|
|
|
The frontend implements a comprehensive service layer for API communication:
|
|
|
|
```typescript
|
|
// Service Architecture
|
|
interface APIService {
|
|
authService: AuthenticationService;
|
|
documentService: DocumentManagementService;
|
|
indexService: IndexManagementService;
|
|
chatService: ChatService;
|
|
adminService: AdminService;
|
|
}
|
|
```
|
|
|
|
## Backend Structure
|
|
|
|
### FastAPI Application Structure
|
|
|
|
```mermaid
|
|
graph TD
|
|
A[main.py] --> B[FastAPI Application]
|
|
B --> C[Middleware Stack]
|
|
C --> D[CORS Middleware]
|
|
C --> E[Authentication Middleware]
|
|
C --> F[Request Timing Middleware]
|
|
|
|
B --> G[API Routers]
|
|
G --> H[Authentication Routes]
|
|
G --> I[Document Routes]
|
|
G --> J[Index Routes]
|
|
G --> K[Chat Routes]
|
|
G --> L[Admin Routes]
|
|
|
|
subgraph "Core Services"
|
|
M[Config Management]
|
|
N[Database Connections]
|
|
O[Cache Management]
|
|
P[Security Utilities]
|
|
end
|
|
|
|
subgraph "Business Logic"
|
|
Q[Document Processor]
|
|
R[RAG Service]
|
|
S[Chat Context Service]
|
|
T[SSO Service]
|
|
end
|
|
|
|
H --> M
|
|
I --> Q
|
|
J --> R
|
|
K --> S
|
|
L --> T
|
|
```
|
|
|
|
### Service Architecture
|
|
|
|
```mermaid
|
|
graph TD
|
|
subgraph "API Layer"
|
|
A[FastAPI Routes]
|
|
end
|
|
|
|
subgraph "Service Layer"
|
|
B[Document Processor Service]
|
|
C[RAG Service]
|
|
D[Chat Context Service]
|
|
E[SSO Service]
|
|
F[Contract Summary Service]
|
|
end
|
|
|
|
subgraph "Core Layer"
|
|
G[Authentication Core]
|
|
H[Security Core]
|
|
I[Cache Core]
|
|
J[ChromaDB Client]
|
|
end
|
|
|
|
subgraph "Data Layer"
|
|
K[MongoDB Models]
|
|
L[Pydantic Schemas]
|
|
M[Database Utilities]
|
|
end
|
|
|
|
A --> B
|
|
A --> C
|
|
A --> D
|
|
A --> E
|
|
A --> F
|
|
|
|
B --> G
|
|
C --> H
|
|
D --> I
|
|
E --> J
|
|
|
|
G --> K
|
|
H --> L
|
|
I --> M
|
|
```
|
|
|
|
## Database Schema
|
|
|
|
### MongoDB Collections
|
|
|
|
```mermaid
|
|
erDiagram
|
|
users {
|
|
ObjectId _id PK
|
|
string email UK
|
|
string hashed_password
|
|
string role
|
|
boolean is_active
|
|
string auth_method
|
|
string sso_provider
|
|
array index_access
|
|
datetime created_at
|
|
datetime updated_at
|
|
}
|
|
|
|
indices {
|
|
ObjectId _id PK
|
|
string index_id UK
|
|
string name
|
|
string description
|
|
ObjectId created_by FK
|
|
string status
|
|
int document_count
|
|
object settings
|
|
datetime created_at
|
|
}
|
|
|
|
documents {
|
|
ObjectId _id PK
|
|
string filename
|
|
string index_id FK
|
|
ObjectId uploaded_by FK
|
|
string processing_status
|
|
string embedding_status
|
|
array text_chunks
|
|
int chunk_count
|
|
array vector_ids
|
|
datetime created_at
|
|
}
|
|
|
|
chat_messages {
|
|
ObjectId _id PK
|
|
ObjectId user_id FK
|
|
string index_id FK
|
|
string query
|
|
string response
|
|
object debug_info
|
|
float response_time
|
|
boolean cached
|
|
array sources
|
|
datetime created_at
|
|
}
|
|
|
|
users ||--o{ indices : "creates"
|
|
users ||--o{ documents : "uploads"
|
|
users ||--o{ chat_messages : "sends"
|
|
indices ||--o{ documents : "contains"
|
|
```
|
|
|
|
### ChromaDB Collections
|
|
|
|
```mermaid
|
|
graph TD
|
|
A[ChromaDB Database] --> B[Collection: index_{index_id}]
|
|
B --> C[Document Vectors]
|
|
C --> D[Vector Data]
|
|
C --> E[Metadata]
|
|
C --> F[Document IDs]
|
|
|
|
E --> G[filename]
|
|
E --> H[document_id]
|
|
E --> I[chunk_index]
|
|
E --> J[index_id]
|
|
E --> K[upload_timestamp]
|
|
```
|
|
|
|
### Redis Cache Structure
|
|
|
|
```mermaid
|
|
graph TD
|
|
A[Redis Cache] --> B[Chat Responses]
|
|
A --> C[User Sessions]
|
|
A --> D[Index Metadata]
|
|
|
|
B --> E["chat:{index_id}:{query_hash}"]
|
|
C --> F["session:{user_id}"]
|
|
D --> G["index_meta:{index_id}"]
|
|
|
|
E --> H[Cached Response + Sources]
|
|
F --> I[User State + Preferences]
|
|
G --> J[Index Statistics]
|
|
```
|
|
|
|
## Deployment Architecture
|
|
|
|
### Production Deployment
|
|
|
|
```mermaid
|
|
graph TD
|
|
subgraph "Load Balancer"
|
|
A[nginx/ALB]
|
|
end
|
|
|
|
subgraph "Application Tier"
|
|
B[FastAPI Container 1]
|
|
C[FastAPI Container 2]
|
|
D[React Frontend]
|
|
end
|
|
|
|
subgraph "Data Tier"
|
|
E[MongoDB Cluster]
|
|
F[Redis Cluster]
|
|
G[ChromaDB Persistent Volume]
|
|
H[File Storage]
|
|
end
|
|
|
|
subgraph "External Services"
|
|
I[OpenAI API]
|
|
J[LlamaParse API]
|
|
K[Azure AD]
|
|
end
|
|
|
|
A --> B
|
|
A --> C
|
|
A --> D
|
|
|
|
B --> E
|
|
B --> F
|
|
B --> G
|
|
B --> H
|
|
C --> E
|
|
C --> F
|
|
C --> G
|
|
C --> H
|
|
|
|
B --> I
|
|
B --> J
|
|
B --> K
|
|
C --> I
|
|
C --> J
|
|
C --> K
|
|
```
|
|
|
|
### Docker Deployment
|
|
|
|
```mermaid
|
|
graph TD
|
|
A[docker-compose.yml] --> B[Frontend Container]
|
|
A --> C[Backend Container]
|
|
A --> D[MongoDB Container]
|
|
A --> E[Redis Container]
|
|
|
|
B --> F[nginx:alpine]
|
|
C --> G[python:3.11]
|
|
D --> H[mongo:latest]
|
|
E --> I[redis:alpine]
|
|
|
|
subgraph "Volumes"
|
|
J[uploads_volume]
|
|
K[indices_volume]
|
|
L[mongo_data]
|
|
M[redis_data]
|
|
end
|
|
|
|
C --> J
|
|
C --> K
|
|
D --> L
|
|
E --> M
|
|
```
|
|
|
|
### Environment Configuration
|
|
|
|
```mermaid
|
|
graph TD
|
|
A[Environment Variables] --> B[Database Config]
|
|
A --> C[API Keys]
|
|
A --> D[Security Settings]
|
|
A --> E[Feature Flags]
|
|
|
|
B --> F[MONGODB_URL]
|
|
B --> G[REDIS_URL]
|
|
|
|
C --> H[OPENAI_API_KEY]
|
|
C --> I[LLAMAPARSE_API_KEY]
|
|
|
|
D --> J[JWT_SECRET_KEY]
|
|
D --> K[CORS_ORIGINS]
|
|
|
|
E --> L[SSO_ENABLED]
|
|
E --> M[CACHE_ENABLED]
|
|
E --> N[DEBUG]
|
|
```
|
|
|
|
## Security Features
|
|
|
|
### Security Architecture
|
|
|
|
```mermaid
|
|
graph TD
|
|
subgraph "Authentication Layer"
|
|
A[JWT Tokens]
|
|
B[Password Hashing]
|
|
C[SSO Integration]
|
|
D[Session Management]
|
|
end
|
|
|
|
subgraph "Authorization Layer"
|
|
E[Role-Based Access]
|
|
F[Index-Level Permissions]
|
|
G[Admin Controls]
|
|
H[User Restrictions]
|
|
end
|
|
|
|
subgraph "Data Security"
|
|
I[Input Validation]
|
|
J[SQL Injection Prevention]
|
|
K[File Upload Validation]
|
|
L[Data Encryption]
|
|
end
|
|
|
|
subgraph "Network Security"
|
|
M[CORS Configuration]
|
|
N[HTTPS Enforcement]
|
|
O[Rate Limiting]
|
|
P[API Security Headers]
|
|
end
|
|
|
|
A --> E
|
|
B --> F
|
|
C --> G
|
|
D --> H
|
|
|
|
E --> I
|
|
F --> J
|
|
G --> K
|
|
H --> L
|
|
|
|
I --> M
|
|
J --> N
|
|
K --> O
|
|
L --> P
|
|
```
|
|
|
|
### Security Measures
|
|
|
|
1. **Authentication Security**
|
|
- JWT tokens with configurable expiration
|
|
- Bcrypt password hashing with salt rounds
|
|
- Azure AD integration with token validation
|
|
- Automatic session cleanup
|
|
|
|
2. **Authorization Controls**
|
|
- Role-based access control (Admin/User)
|
|
- Index-level access permissions
|
|
- Protected route implementation
|
|
- Resource-level authorization checks
|
|
|
|
3. **Input Validation & Sanitization**
|
|
- Pydantic schema validation
|
|
- File type and size restrictions
|
|
- SQL injection prevention through ODM
|
|
- XSS protection in frontend
|
|
|
|
4. **Data Protection**
|
|
- Encrypted password storage
|
|
- Secure token transmission
|
|
- Private document storage
|
|
- Audit logging for admin actions
|
|
|
|
## Performance Optimizations
|
|
|
|
### Caching Strategy
|
|
|
|
```mermaid
|
|
graph TD
|
|
A[Client Request] --> B{Cache Layer 1}
|
|
B -->|Hit| C[Return Cached Response]
|
|
B -->|Miss| D{Cache Layer 2}
|
|
D -->|Hit| E[Return Database Cache]
|
|
D -->|Miss| F[Process Request]
|
|
F --> G[Update All Caches]
|
|
G --> H[Return Response]
|
|
|
|
subgraph "Cache Layers"
|
|
I[Browser Cache]
|
|
J[Redis Application Cache]
|
|
K[Database Query Cache]
|
|
L[Vector Search Cache]
|
|
end
|
|
```
|
|
|
|
### Database Optimizations
|
|
|
|
1. **MongoDB Indexing Strategy**
|
|
- Compound indexes on frequently queried fields
|
|
- Text indexes for search functionality
|
|
- TTL indexes for automatic cleanup
|
|
- Index monitoring and optimization
|
|
|
|
2. **Query Optimization**
|
|
- Aggregation pipeline optimization
|
|
- Projection to reduce data transfer
|
|
- Pagination for large result sets
|
|
- Connection pooling for efficiency
|
|
|
|
3. **Vector Store Optimization**
|
|
- Batch embedding generation
|
|
- Optimized chunk sizes for retrieval
|
|
- Index compression for storage efficiency
|
|
- Similarity search optimization
|
|
|
|
### Frontend Performance
|
|
|
|
1. **Code Splitting**
|
|
- Route-based code splitting
|
|
- Lazy loading of components
|
|
- Dynamic imports for optimization
|
|
- Bundle size analysis
|
|
|
|
2. **Caching & Storage**
|
|
- Service worker caching
|
|
- Local storage optimization
|
|
- API response caching
|
|
- Static asset caching
|
|
|
|
3. **Rendering Optimization**
|
|
- React.memo for expensive components
|
|
- useCallback for function optimization
|
|
- Virtual scrolling for large lists
|
|
- Debounced search inputs
|
|
|
|
### Backend Performance
|
|
|
|
1. **Async Processing**
|
|
- Non-blocking I/O operations
|
|
- Background task processing
|
|
- Queue-based document processing
|
|
- Concurrent request handling
|
|
|
|
2. **Memory Management**
|
|
- Efficient object lifecycle management
|
|
- Memory pool optimization
|
|
- Garbage collection tuning
|
|
- Resource cleanup automation
|
|
|
|
3. **API Optimization**
|
|
- Response compression
|
|
- Pagination implementation
|
|
- Field selection for responses
|
|
- Request/response caching
|
|
|
|
---
|
|
|
|
## Conclusion
|
|
|
|
The Contract Analysis Tool v2.0 represents a comprehensive, production-ready solution for intelligent document analysis and querying. The architecture emphasizes scalability, security, and performance while maintaining ease of use and deployment flexibility.
|
|
|
|
Key architectural strengths:
|
|
- **Modular Design**: Clear separation of concerns with microservices approach
|
|
- **Scalable Storage**: Hybrid database architecture optimized for different data types
|
|
- **Security-First**: Comprehensive authentication and authorization implementation
|
|
- **Performance-Optimized**: Multi-layer caching and async processing
|
|
- **Developer-Friendly**: Well-structured codebase with comprehensive documentation
|
|
|
|
The system is designed to handle enterprise-scale document processing workloads while providing an intuitive user experience for both administrators and end users. |