initial commit

This commit is contained in:
michael 2025-08-14 15:03:33 -05:00
commit 82be78c7ae
10829 changed files with 1270486 additions and 0 deletions

74
.gitignore vendored Normal file
View file

@ -0,0 +1,74 @@
# Environment files - NEVER commit these
.env
.env.local
.env.production
# Virtual Environment
llama-index/
# Data and uploads (contains sensitive documents)
data/
indices/
# venv
venv/
# Logs (may contain sensitive information)
*.log
*.txt
# Python cache
__pycache__/
*.pyc
*.pyo
*.pyd
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
# OS files
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db
# IDE files
.vscode/
.idea/
*.swp
*.swo
*~
# Database files
*.sqlite
*.sqlite3
*.db
# Temporary files
*.tmp
*.temp
*.backup
*.bak
# API keys or sensitive config (backup safety)
config.local.php
secrets.php
# Error logs
error_log
error.log

618
API_TESTING_GUIDE.md Normal file
View file

@ -0,0 +1,618 @@
# Contract Analysis Tool - API Testing Guide
This guide provides comprehensive step-by-step instructions to test all APIs in the Contract Analysis Tool with real input examples.
## Prerequisites
1. **Backend Server Running**: `http://localhost:8000`
2. **Frontend Server Running**: `http://localhost:3000`
3. **MongoDB Running**: `localhost:27017`
4. **Redis Running**: `localhost:6379`
5. **Environment Variables**: Ensure `.env` files are properly configured
## Authentication Setup
### Step 1: Initialize Default Users
```bash
curl -X POST http://localhost:8000/api/v1/auth/init-users
```
**Expected Response:**
```json
{
"message": "Default users created successfully",
"admin_email": "admin@oliver.agency",
"user_email": "user@oliver.agency"
}
```
### Step 2: Test Health Check
```bash
curl http://localhost:8000/health
```
**Expected Response:**
```json
{
"status": "healthy",
"version": "2.0.0"
}
```
## Authentication APIs
### 1. Admin Login
```bash
curl -X POST http://localhost:8000/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{
"email": "admin@oliver.agency",
"password": "admin123"
}'
```
**Expected Response:**
```json
{
"access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"token_type": "bearer",
"user": {
"id": "...",
"email": "admin@oliver.agency",
"role": "admin",
"is_active": true,
"index_access": [...]
}
}
```
**Save the admin token:**
```bash
ADMIN_TOKEN="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
```
### 2. User Login
```bash
curl -X POST http://localhost:8000/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{
"email": "user@oliver.agency",
"password": "user123"
}'
```
**Save the user token:**
```bash
USER_TOKEN="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
```
### 3. Get Current User Info
```bash
curl -X GET http://localhost:8000/api/v1/auth/me \
-H "Authorization: Bearer $ADMIN_TOKEN"
```
## Admin APIs
### User Management
#### 1. Get All Users
```bash
curl -X GET http://localhost:8000/api/v1/admin/users \
-H "Authorization: Bearer $ADMIN_TOKEN"
```
#### 2. Create New User
```bash
curl -X POST http://localhost:8000/api/v1/admin/users \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"email": "testuser@example.com",
"password": "SecurePass123",
"role": "user",
"is_active": true
}'
```
**Save the user ID from response:**
```bash
NEW_USER_ID="686ec44005b0398525fde787"
```
#### 3. Update User
```bash
curl -X PUT http://localhost:8000/api/v1/admin/users/$NEW_USER_ID \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"email": "updated_testuser@example.com",
"role": "admin",
"is_active": false
}'
```
#### 4. Delete User
```bash
curl -X DELETE http://localhost:8000/api/v1/admin/users/$NEW_USER_ID \
-H "Authorization: Bearer $ADMIN_TOKEN"
```
### Index Management
#### 1. Get All Indices
```bash
curl -X GET http://localhost:8000/api/v1/admin/indices \
-H "Authorization: Bearer $ADMIN_TOKEN"
```
#### 2. Create New Index
```bash
curl -X POST http://localhost:8000/api/v1/admin/indices/create \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-F "name=Test Contract Index" \
-F "description=Index for testing contract analysis" \
-F "chunk_size=1000" \
-F "chunk_overlap=200"
```
**Save the index ID from response:**
```bash
TEST_INDEX_ID="test-index-2025-07-09-abc123"
```
#### 3. Delete Index
```bash
curl -X DELETE http://localhost:8000/api/v1/admin/indices/$TEST_INDEX_ID \
-H "Authorization: Bearer $ADMIN_TOKEN"
```
### Document Management
#### 1. Upload Single Document
```bash
curl -X POST http://localhost:8000/api/v1/admin/documents/upload-single \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-F "file=@/path/to/your/document.pdf" \
-F "index_id=$TEST_INDEX_ID" \
-F "custom_name=Test Contract Document"
```
#### 2. Upload Multiple Documents
```bash
curl -X POST http://localhost:8000/api/v1/admin/documents/upload-multiple \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-F "files=@/path/to/document1.pdf" \
-F "files=@/path/to/document2.pdf" \
-F "index_id=$TEST_INDEX_ID" \
-F "base_name=Contract Batch"
```
#### 3. Get Index Documents
```bash
curl -X GET http://localhost:8000/api/v1/admin/documents/$TEST_INDEX_ID \
-H "Authorization: Bearer $ADMIN_TOKEN"
```
**Save a document ID from response:**
```bash
DOCUMENT_ID="686ebfa705b0398525fde785"
```
#### 4. Reprocess Document
```bash
curl -X POST http://localhost:8000/api/v1/admin/documents/$DOCUMENT_ID/reprocess \
-H "Authorization: Bearer $ADMIN_TOKEN"
```
#### 5. Delete Document
```bash
curl -X DELETE http://localhost:8000/api/v1/admin/documents/$DOCUMENT_ID \
-H "Authorization: Bearer $ADMIN_TOKEN"
```
### Index Access Management
#### 1. Grant Index Access to User
```bash
curl -X POST http://localhost:8000/api/v1/admin/users/$USER_ID/grant-index-access \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"index_id": "'$TEST_INDEX_ID'"
}'
```
#### 2. Revoke Index Access from User
```bash
curl -X POST http://localhost:8000/api/v1/admin/users/$USER_ID/revoke-index-access \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"index_id": "'$TEST_INDEX_ID'"
}'
```
#### 3. Grant All Indices Access
```bash
curl -X POST http://localhost:8000/api/v1/admin/grant-all-indices/$USER_ID \
-H "Authorization: Bearer $ADMIN_TOKEN"
```
### System Monitoring
#### 1. Get System Statistics
```bash
curl -X GET http://localhost:8000/api/v1/admin/stats \
-H "Authorization: Bearer $ADMIN_TOKEN"
```
#### 2. Get Processing Status
```bash
curl -X GET http://localhost:8000/api/v1/admin/documents/processing-status \
-H "Authorization: Bearer $ADMIN_TOKEN"
```
#### 3. Process Pending Documents
```bash
curl -X POST http://localhost:8000/api/v1/admin/documents/process-pending \
-H "Authorization: Bearer $ADMIN_TOKEN"
```
### Admin RAG Query
```bash
curl -X POST http://localhost:8000/api/v1/admin/chat/query \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-F "query=What are the key terms of this contract?" \
-F "index_id=$TEST_INDEX_ID" \
-F "top_k=5"
```
## User APIs
### Index Access
#### 1. Get User's Indices
```bash
curl -X GET http://localhost:8000/api/v1/indices/ \
-H "Authorization: Bearer $USER_TOKEN"
```
#### 2. Get Specific Index
```bash
curl -X GET http://localhost:8000/api/v1/indices/$TEST_INDEX_ID \
-H "Authorization: Bearer $USER_TOKEN"
```
#### 3. Create User Index
```bash
curl -X POST http://localhost:8000/api/v1/indices/create \
-H "Authorization: Bearer $USER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "My Contract Index",
"description": "Personal contract analysis index"
}'
```
### Document Operations
#### 1. Upload Document to Index
```bash
curl -X POST http://localhost:8000/api/v1/documents/upload \
-H "Authorization: Bearer $USER_TOKEN" \
-F "file=@/path/to/contract.pdf" \
-F "index_id=$USER_INDEX_ID"
```
#### 2. Get Documents by Index
```bash
curl -X GET http://localhost:8000/api/v1/documents/index/$USER_INDEX_ID \
-H "Authorization: Bearer $USER_TOKEN"
```
#### 3. Get Document Details
```bash
curl -X GET http://localhost:8000/api/v1/documents/$DOCUMENT_ID \
-H "Authorization: Bearer $USER_TOKEN"
```
#### 4. Get Document Summary (AI)
```bash
curl -X GET http://localhost:8000/api/v1/documents/$DOCUMENT_ID/summary \
-H "Authorization: Bearer $USER_TOKEN"
```
**Expected Response:**
```json
{
"document_id": "686ebfa705b0398525fde785",
"filename": "Contract Document.pdf",
"summary": "This document is a service agreement between...",
"processing_status": "completed",
"generated_at": "2025-07-09T19:41:12.301719"
}
```
#### 5. Download Document
```bash
curl -X GET http://localhost:8000/api/v1/documents/$DOCUMENT_ID/download \
-H "Authorization: Bearer $USER_TOKEN" \
--output downloaded_document.pdf
```
#### 6. Delete Document
```bash
curl -X DELETE http://localhost:8000/api/v1/documents/$DOCUMENT_ID \
-H "Authorization: Bearer $USER_TOKEN"
```
### Chat and RAG
#### 1. Query Documents (RAG)
```bash
curl -X POST http://localhost:8000/api/v1/chat/query \
-H "Authorization: Bearer $USER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"query": "What are the payment terms in this contract?",
"index_id": "'$USER_INDEX_ID'",
"top_k": 5
}'
```
**Expected Response:**
```json
{
"response": "Based on the contract analysis, the payment terms include...",
"cached": false,
"response_time": 2.34,
"debug_info": {
"sources": [...],
"context_used": true,
"context_messages_count": 3
}
}
```
#### 2. Get Chat History
```bash
curl -X GET http://localhost:8000/api/v1/chat/history/$USER_INDEX_ID \
-H "Authorization: Bearer $USER_TOKEN"
```
#### 3. Clear Chat History
```bash
curl -X DELETE http://localhost:8000/api/v1/chat/history/$USER_INDEX_ID \
-H "Authorization: Bearer $USER_TOKEN"
```
#### 4. Get Index Chat Status
```bash
curl -X GET http://localhost:8000/api/v1/chat/status/$USER_INDEX_ID \
-H "Authorization: Bearer $USER_TOKEN"
```
## Permission Testing
### Test User Access Restrictions
#### 1. User Trying to Access Admin Endpoint (Should Fail)
```bash
curl -X GET http://localhost:8000/api/v1/admin/users \
-H "Authorization: Bearer $USER_TOKEN"
```
**Expected Response:**
```json
{
"detail": "Not enough permissions"
}
```
#### 2. User Accessing Unauthorized Index (Should Fail)
```bash
curl -X GET http://localhost:8000/api/v1/documents/index/unauthorized-index-id \
-H "Authorization: Bearer $USER_TOKEN"
```
#### 3. Admin Accessing Any Resource (Should Work)
```bash
curl -X GET http://localhost:8000/api/v1/documents/index/$ANY_INDEX_ID \
-H "Authorization: Bearer $ADMIN_TOKEN"
```
## Frontend Testing
### 1. Access Frontend
```bash
# Open browser to
http://localhost:3000
```
### 2. Test Login Flow
1. Navigate to `http://localhost:3000/login`
2. Login with admin credentials: `admin@oliver.agency` / `admin123`
3. Verify dashboard access
4. Check admin panel visibility in sidebar
### 3. Test Admin Panel
1. Navigate to `http://localhost:3000/dashboard/admin`
2. Test user management (create, edit, delete)
3. Test index access management
4. Verify system statistics
### 4. Test User Features
1. Login as user: `user@oliver.agency` / `user123`
2. Test document upload and management
3. Test AI summary generation
4. Test chat interface with documents
## Error Testing
### 1. Invalid Authentication
```bash
curl -X GET http://localhost:8000/api/v1/admin/users \
-H "Authorization: Bearer invalid_token"
```
### 2. Missing Required Fields
```bash
curl -X POST http://localhost:8000/api/v1/admin/users \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"email": "incomplete@user.com"
}'
```
### 3. Duplicate Email
```bash
curl -X POST http://localhost:8000/api/v1/admin/users \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"email": "admin@oliver.agency",
"password": "test123",
"role": "user"
}'
```
## Performance Testing
### 1. Large File Upload
```bash
curl -X POST http://localhost:8000/api/v1/documents/upload \
-H "Authorization: Bearer $USER_TOKEN" \
-F "file=@/path/to/large_document.pdf" \
-F "index_id=$USER_INDEX_ID" \
--max-time 300
```
### 2. Concurrent Chat Queries
```bash
# Run multiple queries simultaneously
for i in {1..5}; do
curl -X POST http://localhost:8000/api/v1/chat/query \
-H "Authorization: Bearer $USER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"query": "Query '$i': What is the contract about?",
"index_id": "'$USER_INDEX_ID'"
}' &
done
wait
```
## Complete Test Workflow
### 1. Setup Test Environment
```bash
# Start services
cd /path/to/backend && uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 &
cd /path/to/frontend && npm run dev &
# Initialize users
curl -X POST http://localhost:8000/api/v1/auth/init-users
```
### 2. Login and Get Tokens
```bash
ADMIN_TOKEN=$(curl -X POST http://localhost:8000/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"email": "admin@oliver.agency", "password": "admin123"}' \
| jq -r '.access_token')
USER_TOKEN=$(curl -X POST http://localhost:8000/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"email": "user@oliver.agency", "password": "user123"}' \
| jq -r '.access_token')
```
### 3. Create Test Index
```bash
INDEX_RESPONSE=$(curl -X POST http://localhost:8000/api/v1/admin/indices/create \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-F "name=API Test Index" \
-F "description=Index for API testing")
TEST_INDEX_ID=$(echo $INDEX_RESPONSE | jq -r '.index_id')
```
### 4. Upload and Process Document
```bash
curl -X POST http://localhost:8000/api/v1/admin/documents/upload-single \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-F "file=@/path/to/test_contract.pdf" \
-F "index_id=$TEST_INDEX_ID" \
-F "custom_name=Test Contract"
```
### 5. Grant User Access
```bash
USER_ID=$(curl -X GET http://localhost:8000/api/v1/admin/users \
-H "Authorization: Bearer $ADMIN_TOKEN" \
| jq -r '.[] | select(.email=="user@oliver.agency") | .id')
curl -X POST http://localhost:8000/api/v1/admin/users/$USER_ID/grant-index-access \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"index_id": "'$TEST_INDEX_ID'"}'
```
### 6. Test User Chat
```bash
curl -X POST http://localhost:8000/api/v1/chat/query \
-H "Authorization: Bearer $USER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"query": "Summarize this contract",
"index_id": "'$TEST_INDEX_ID'"
}'
```
### 7. Test AI Summary
```bash
DOCUMENT_ID=$(curl -X GET http://localhost:8000/api/v1/admin/documents/$TEST_INDEX_ID \
-H "Authorization: Bearer $ADMIN_TOKEN" \
| jq -r '.documents[0].id')
curl -X GET http://localhost:8000/api/v1/documents/$DOCUMENT_ID/summary \
-H "Authorization: Bearer $USER_TOKEN"
```
## Troubleshooting
### Common Issues
1. **401 Unauthorized**: Check token validity and format
2. **403 Forbidden**: Verify user has required permissions
3. **404 Not Found**: Ensure resource exists and user has access
4. **422 Validation Error**: Check request body format and required fields
5. **500 Internal Server Error**: Check backend logs and OpenAI API key
### Debug Commands
```bash
# Check backend logs
tail -f backend.log
# Check database connection
curl http://localhost:8000/health
# Verify token
curl -X GET http://localhost:8000/api/v1/auth/me \
-H "Authorization: Bearer $TOKEN"
# Check OpenAI API key
echo $OPENAI_API_KEY
```
## Notes
- Replace `/path/to/your/document.pdf` with actual file paths
- Replace placeholder IDs with actual IDs from responses
- Ensure all environment variables are properly set
- All timestamps are in UTC format
- File uploads support PDF, DOCX, DOC, TXT, CSV, JSON, HTML, MD, RTF formats
- Maximum file size is 50MB (configurable)

297
CLAUDE.md Normal file
View file

@ -0,0 +1,297 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
This is a modern Contract Analysis Tool v2.0 - a production-ready Retrieval-Augmented Generation (RAG) application for intelligent contract analysis and document Q&A. The system consists of a FastAPI backend and React frontend.
## Architecture
**Stack:**
- **Backend:** FastAPI + MongoDB + Redis + ChromaDB
- **Frontend:** React + Vite + Tailwind CSS
- **AI/ML:** OpenAI GPT-4, LlamaIndex, ChromaDB for vector storage
- **Authentication:** JWT-based with role-based access control
**Data Flow:**
```
React Frontend → FastAPI Backend → MongoDB + ChromaDB → OpenAI API
Redis Cache
```
## Development Commands
### Backend (FastAPI)
**Start development server:**
```bash
cd backend
source venv/bin/activate # On Windows: venv\Scripts\activate
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
```
**Install dependencies:**
```bash
cd backend
pip install -r requirements.txt
```
**Database setup:**
- MongoDB runs on port 27017
- Redis runs on port 6379
- Application auto-creates collections/indexes on startup
**Initialize default users:**
```bash
curl -X POST http://localhost:8000/api/v1/auth/init-users
```
**Health check:**
```bash
curl http://localhost:8000/health
```
### Frontend (React)
**Start development server:**
```bash
cd frontend
npm run dev
```
**Build for production:**
```bash
cd frontend
npm run build
```
**Lint code:**
```bash
cd frontend
npm run lint
```
**Install dependencies:**
```bash
cd frontend
npm install
```
### Docker Development
**Start all services:**
```bash
cd backend
docker-compose up -d
```
**Backend only (with external DB):**
```bash
cd backend
docker-compose up -d mongo redis
```
## Project Structure
### Backend (`/backend`)
- `app/main.py` - FastAPI application entry point
- `app/config/settings.py` - Environment configuration and database settings
- `app/api/v1/` - API endpoints (auth, documents, indices, chat, admin)
- `app/models/` - MongoDB data models (user, document, index, chat)
- `app/services/` - Business logic (document_processor, rag_service)
- `app/core/` - Core utilities (auth, security, cache)
- `app/utils/` - Helper utilities (file_utils)
### Frontend (`/frontend`)
- `src/App.jsx` - Main React application with routing
- `src/pages/` - Page components (Dashboard, DocumentManager, ChatInterface, AdminPanel)
- `src/components/` - Reusable UI components organized by feature
- `src/services/` - API service layer (authService, documentService, chatService, indexService)
- `src/context/` - React context providers (AuthContext)
- `src/utils/` - Frontend utilities and constants
## Key Features & Workflows
### Authentication System
- JWT-based authentication with role-based access (admin/user)
- Default users: `admin@oliver.agency`/`admin123`, `user@oliver.agency`/`user123`
- Protected routes with automatic token refresh
### Document Processing Pipeline
1. **Upload** → Document uploaded via React frontend
2. **Process** → Backend processes with LlamaIndex (PDF parsing, chunking)
3. **Index** → Embeddings stored in ChromaDB, metadata in MongoDB
4. **Query** → Natural language queries via RAG system
### Index Management
- Users can create document indices for organizing documents
- Role-based access control for index management
- ChromaDB handles vector storage, MongoDB stores metadata
### Chat System
- **Context-Aware Conversations**: AI remembers previous 10 messages within 24-hour window
- **Real-time document Q&A** using RAG with source citations
- **Proper message ordering** - chronological display with correct timestamps
- **Conversation continuity** - responses reference previous context when relevant
- **Configurable top-k** results for query precision (3, 5, 10, 15)
- **Smart caching** - context-dependent responses aren't cached, simple queries are
- **Session statistics** - track response times, cache hit rates, message counts
## Environment Configuration
### Backend (`.env`)
```env
# Database
MONGODB_URL=mongodb://localhost:27017
DATABASE_NAME=contract_analysis
# Redis
REDIS_URL=redis://localhost:6379
# Authentication
JWT_SECRET_KEY=your-super-secret-jwt-key
JWT_ALGORITHM=HS256
JWT_EXPIRE_MINUTES=30
# OpenAI
OPENAI_API_KEY=your-openai-api-key
LLAMAPARSE_API_KEY=your-llamaparse-api-key
# Application
DEBUG=false
CORS_ORIGINS=["http://localhost:3000"]
UPLOAD_DIR=./uploads
INDICES_DIR=./indices
# Cache
CACHE_ENABLED=true
CACHE_TTL=3600
```
### Frontend (`.env`)
```env
VITE_API_URL=http://localhost:8000
VITE_APP_NAME=Contract Analysis Tool
```
## API Endpoints
**Authentication:**
- `POST /api/v1/auth/login` - User login
- `POST /api/v1/auth/init-users` - Initialize default users
**Documents:**
- `POST /api/v1/documents/upload` - Upload documents to index
- `GET /api/v1/documents/{index_id}` - List documents in index
**Indices:**
- `POST /api/v1/indices/create` - Create new document index
- `GET /api/v1/indices/` - List user's indices
**Chat:**
- `POST /api/v1/chat/query` - Query documents with natural language
**Admin:**
- `GET /api/v1/admin/stats` - System statistics (admin only)
- `POST /api/v1/admin/documents/upload-single` - Upload single document
- `POST /api/v1/admin/documents/upload-multiple` - Upload multiple documents
- `GET /api/v1/admin/documents/{index_id}` - Get index documents
- `POST /api/v1/admin/documents/{document_id}/reprocess` - Reprocess document
- `DELETE /api/v1/admin/documents/{document_id}` - Delete document
- `GET /api/v1/admin/indices` - Get all indices
- `POST /api/v1/admin/indices/create` - Create new index
- `POST /api/v1/admin/chat/query` - RAG query interface
## Development Notes
### Database Connections
- MongoDB connection pooling handled automatically
- Redis connection with fallback if unavailable
- ChromaDB indices stored in `./indices` directory
### File Handling
- Uploads stored in `./uploads/{index_id}/` directory structure
- Supported formats: PDF, DOCX, DOC, TXT, CSV, JSON, HTML, MD, RTF
- 50MB file size limit (configurable)
- Automatic file naming for batch uploads
### Caching Strategy
- Redis caches API responses for performance
- TTL configurable via `CACHE_TTL` environment variable
- Cache keys include user context for security
### Document Processing
- Async processing with database status tracking
- Processing states: pending → processing → completed/failed
- Embedding states: pending → processing → completed/failed
- Automatic retry capability for failed documents
- Chunk count and vector ID tracking in MongoDB
### Vector Storage
- ChromaDB persistent storage in `./indices/chroma_db/`
- Collections named `index_{index_id}` for organization
- Metadata includes document_id, chunk_index, index_id
- Configurable similarity search with top-k results
### Chat Context System
- **Context Window**: 24-hour rolling window with max 10 previous messages
- **Smart Context**: AI uses conversation history for continuity and follow-up questions
- **Context Caching**: Responses with context aren't cached (dynamic), simple queries are cached
- **Database Storage**: All messages stored with proper timestamps and context metadata
- **Context Display**: Frontend shows when context is used and how many previous messages
- **Session Management**: Track conversation statistics and context usage
### Message Ordering & Timestamps
- **Chronological Order**: Messages displayed in proper time sequence (oldest → newest)
- **Accurate Timestamps**: Server-side timestamp generation with UTC storage
- **Separate Timestamps**: User and assistant messages have distinct timestamps
- **Proper Database Storage**: `created_at`, `user_timestamp`, and `assistant_timestamp` fields
- **Frontend Display**: Localized timestamp formatting with date and time
- **Context Indicators**: Visual indicators show when AI used previous conversation context
### Error Handling & Validation
- **Collection Validation**: Check ChromaDB collection exists before querying
- **Document Status Check**: Verify documents are fully processed before chat
- **Graceful Degradation**: Fallback responses when context generation fails
- **User-Friendly Errors**: Clear, actionable error messages with next steps
- **Progress Tracking**: Real-time status updates during document processing
### Progress Visualization
- **Upload Progress**: Real-time progress bars during file uploads
- **Processing Status**: Visual indicators for document processing stages
- **Embedding Progress**: Separate progress tracking for text processing and embedding
- **Success States**: Clear visual feedback when operations complete
- **Status Dashboard**: Comprehensive view of document processing pipeline
### Security Features
- JWT token validation on protected routes
- Input validation with Pydantic schemas
- CORS configuration for frontend integration
- File upload validation and sanitization
## Testing
### Backend Testing
```bash
cd backend
# API documentation available at http://localhost:8000/docs
# Manual testing via Swagger UI
```
### Frontend Testing
- React components use modern hooks patterns
- Error boundaries for graceful error handling
- Loading states for better UX
## Migration Context
This is a migrated application from PHP/Python to FastAPI/React. The migration maintained:
- Complete feature parity with the original application
- All document processing capabilities
- ChromaDB indices compatibility
- Enhanced performance and security
- Modern, responsive UI
The `MIGRATION_PLAN.md` file contains detailed information about the migration process and architecture decisions.

326
MIGRATION_PLAN.md Normal file
View file

@ -0,0 +1,326 @@
# Migration Plan: PHP/Python → FastAPI/React
## Overview
This document outlines the complete migration strategy for transforming the current PHP/Python hybrid RAG application into a modern FastAPI backend with React frontend architecture.
## Current Application Analysis
### Existing Features
- **User Authentication**: Role-based access (admin/user) with SQLite storage
- **Document Management**: File uploads, processing, and indexing
- **RAG System**: LlamaIndex + ChromaDB for document retrieval
- **Contract Analysis**: GPT-4 powered contract field extraction
- **Chat Interface**: Natural language document Q&A
- **Caching System**: Response caching for performance
- **Index Management**: User-specific access control to document indices
### Current Architecture
```
PHP Frontend (Web UI) → Python Backend (Processing) → OpenAI API
↓ ↓
SQLite DB ChromaDB Vectors
```
## New Architecture
### Target Architecture
```
React Frontend → FastAPI Backend → MongoDB + ChromaDB → OpenAI API
Redis Cache
```
## Project Structure
### Backend (FastAPI)
```
backend/
├── app/
│ ├── __init__.py
│ ├── main.py # FastAPI app entry point
│ ├── config/
│ │ ├── __init__.py
│ │ ├── settings.py # Environment configuration
│ │ └── database.py # MongoDB connection
│ ├── models/
│ │ ├── __init__.py
│ │ ├── user.py # User data models
│ │ ├── document.py # Document models
│ │ ├── index.py # Index models
│ │ └── chat.py # Chat/query models
│ ├── schemas/
│ │ ├── __init__.py
│ │ ├── user.py # Pydantic schemas
│ │ ├── document.py
│ │ ├── index.py
│ │ └── chat.py
│ ├── api/
│ │ ├── __init__.py
│ │ ├── deps.py # Dependencies
│ │ └── v1/
│ │ ├── __init__.py
│ │ ├── auth.py # Authentication endpoints
│ │ ├── documents.py # Document management
│ │ ├── indices.py # Index management
│ │ ├── chat.py # Chat/query endpoints
│ │ └── admin.py # Admin endpoints
│ ├── core/
│ │ ├── __init__.py
│ │ ├── auth.py # JWT authentication
│ │ ├── security.py # Security utilities
│ │ └── cache.py # Redis caching
│ ├── services/
│ │ ├── __init__.py
│ │ ├── document_processor.py # Document processing service
│ │ ├── rag_service.py # RAG retrieval service
│ │ ├── index_service.py # Index management service
│ │ └── openai_service.py # OpenAI integration
│ ├── utils/
│ │ ├── __init__.py
│ │ ├── file_utils.py # File handling utilities
│ │ └── llama_utils.py # LlamaIndex utilities
│ └── middleware/
│ ├── __init__.py
│ ├── cors.py # CORS middleware
│ └── logging.py # Request logging
├── requirements.txt
├── Dockerfile
├── docker-compose.yml
└── .env.example
```
### Frontend (React)
```
frontend/
├── public/
│ ├── index.html
│ └── favicon.ico
├── src/
│ ├── components/
│ │ ├── common/
│ │ │ ├── Header.jsx
│ │ │ ├── Sidebar.jsx
│ │ │ ├── Layout.jsx
│ │ │ └── LoadingSpinner.jsx
│ │ ├── auth/
│ │ │ ├── LoginForm.jsx
│ │ │ └── ProtectedRoute.jsx
│ │ ├── documents/
│ │ │ ├── DocumentUpload.jsx
│ │ │ ├── DocumentList.jsx
│ │ │ └── DocumentViewer.jsx
│ │ ├── chat/
│ │ │ ├── ChatInterface.jsx
│ │ │ ├── MessageList.jsx
│ │ │ └── MessageInput.jsx
│ │ ├── indices/
│ │ │ ├── IndexList.jsx
│ │ │ ├── IndexManager.jsx
│ │ │ └── CreateIndex.jsx
│ │ └── admin/
│ │ ├── UserManagement.jsx
│ │ └── SystemMonitor.jsx
│ ├── hooks/
│ │ ├── useAuth.js
│ │ ├── useDocuments.js
│ │ ├── useChat.js
│ │ └── useIndices.js
│ ├── services/
│ │ ├── api.js # Axios configuration
│ │ ├── authService.js # Authentication API calls
│ │ ├── documentService.js # Document API calls
│ │ ├── chatService.js # Chat API calls
│ │ └── indexService.js # Index API calls
│ ├── context/
│ │ ├── AuthContext.js
│ │ └── ThemeContext.js
│ ├── utils/
│ │ ├── constants.js
│ │ ├── helpers.js
│ │ └── validation.js
│ ├── styles/
│ │ ├── globals.css
│ │ └── components/
│ ├── App.jsx
│ ├── index.js
│ └── routes.js
├── package.json
├── tailwind.config.js
├── vite.config.js
└── .env.example
```
## Technology Stack
### Backend
- **FastAPI**: Modern, fast web framework for Python APIs
- **MongoDB**: Document database for user data, metadata
- **ChromaDB**: Vector database for document embeddings (kept from current)
- **Redis**: Caching layer for improved performance
- **Pydantic**: Data validation and serialization
- **JWT**: Token-based authentication
- **LlamaIndex**: RAG framework (kept from current)
- **OpenAI**: GPT-4 for analysis and embeddings
### Frontend
- **React 18**: Modern React with hooks
- **Vite**: Fast build tool and dev server
- **Tailwind CSS**: Utility-first CSS framework
- **Axios**: HTTP client for API calls
- **React Router**: Client-side routing
- **React Hook Form**: Form handling
- **Zustand**: State management
- **React Query**: Server state management
## Migration Strategy
### Phase 1: Backend Foundation
1. Set up FastAPI project structure
2. Configure MongoDB connection
3. Implement user authentication with JWT
4. Create data models and schemas
5. Set up Redis caching
### Phase 2: Core Services
1. Port document processing pipeline
2. Implement RAG service with LlamaIndex
3. Create OpenAI integration service
4. Implement index management
5. Set up file upload handling
### Phase 3: API Endpoints
1. Authentication endpoints
2. Document management endpoints
3. Chat/query endpoints
4. Index management endpoints
5. Admin endpoints
### Phase 4: Frontend Development
1. Set up React project with Vite
2. Create authentication flow
3. Build document management interface
4. Implement chat interface
5. Create admin dashboard
### Phase 5: Integration & Testing
1. Connect frontend to backend APIs
2. Implement proper error handling
3. Add loading states and UX improvements
4. Performance optimization
5. Security hardening
### Phase 6: Deployment
1. Docker containerization
2. Environment configuration
3. Production deployment setup
4. Monitoring and logging
## Data Migration
### User Data
- Migrate from SQLite to MongoDB
- Transform user authentication to JWT
- Preserve user roles and permissions
### Document Indices
- Keep existing ChromaDB indices
- Update index metadata in MongoDB
- Maintain document access permissions
### Configuration
- Environment variables migration
- API key management
- Cache configuration
## Key Improvements
### Performance
- Async/await throughout backend
- Redis caching for API responses
- Optimized database queries
- React Query for client-side caching
### Security
- JWT-based authentication
- Input validation with Pydantic
- CORS configuration
- Rate limiting
### Scalability
- Microservice-ready architecture
- Database connection pooling
- Horizontal scaling support
- Load balancing ready
### Developer Experience
- Type hints throughout Python code
- API documentation with FastAPI
- Modern React patterns
- Hot reloading in development
### User Experience
- Modern, responsive UI
- Real-time updates
- Better error handling
- Improved performance
## Implementation Timeline
1. **Week 1**: Backend foundation and authentication
2. **Week 2**: Core services and API endpoints
3. **Week 3**: Frontend setup and basic components
4. **Week 4**: Integration and testing
5. **Week 5**: Deployment and optimization
## File Deletion Strategy
Files will be deleted progressively as new implementations are completed:
1. **Phase 1**: Remove PHP authentication files after JWT implementation
2. **Phase 2**: Remove PHP API files after FastAPI endpoints
3. **Phase 3**: Remove Python processing scripts after service implementation
4. **Phase 4**: Remove remaining PHP files after frontend completion
5. **Phase 5**: Clean up temporary files and documentation
## Environment Configuration
### Backend (.env)
```
# Database
MONGODB_URL=mongodb://localhost:27017
DATABASE_NAME=contract_analysis
# Redis
REDIS_URL=redis://localhost:6379
# Authentication
JWT_SECRET_KEY=your-secret-key
JWT_ALGORITHM=HS256
JWT_EXPIRE_MINUTES=30
# OpenAI
OPENAI_API_KEY=your-openai-key
LLAMAPARSE_API_KEY=your-llamaparse-key
# Application
DEBUG=false
CORS_ORIGINS=["http://localhost:3000"]
```
### Frontend (.env)
```
VITE_API_URL=http://localhost:8000
VITE_APP_NAME=Contract Analysis Tool
```
## Success Criteria
- [ ] Complete feature parity with current application
- [ ] Improved performance (faster load times, better caching)
- [ ] Modern, responsive UI
- [ ] Scalable architecture
- [ ] Comprehensive API documentation
- [ ] Security improvements
- [ ] Easy deployment and maintenance
This migration will modernize the application while maintaining all existing functionality and improving performance, security, and maintainability.

349
README.md Normal file
View file

@ -0,0 +1,349 @@
# Contract Analysis Tool v2.0
A modern, production-ready Retrieval-Augmented Generation (RAG) application for intelligent contract analysis and document Q&A. Built with FastAPI backend and React frontend.
![Architecture](https://img.shields.io/badge/Backend-FastAPI-009688)
![Frontend](https://img.shields.io/badge/Frontend-React-61DAFB)
![Database](https://img.shields.io/badge/Database-MongoDB-47A248)
![Cache](https://img.shields.io/badge/Cache-Redis-DC382D)
## 🚀 Features
- **Modern Architecture**: FastAPI + React + MongoDB + Redis
- **AI-Powered Analysis**: GPT-4 integration for contract analysis
- **Document Q&A**: Natural language queries with RAG
- **User Management**: Role-based access control
- **Real-time Processing**: Async document processing
- **Intelligent Caching**: Redis-based response caching
- **Scalable Design**: Microservice-ready architecture
## 🏗️ Architecture
```
React Frontend → FastAPI Backend → MongoDB + ChromaDB → OpenAI API
Redis Cache
```
## 📋 Prerequisites
- **Python 3.11+**
- **Node.js 18+**
- **MongoDB 7+**
- **Redis 7+**
- **OpenAI API Key**
- **LlamaParse API Key** (optional)
## 🛠️ Installation
### Option 1: Docker (Recommended)
1. **Clone the repository**
```bash
git clone <repository-url>
cd llama-contracts-master
```
2. **Set up environment variables**
```bash
# Backend
cp backend/.env.example backend/.env
# Edit backend/.env with your API keys
# Frontend
cp frontend/.env.example frontend/.env
```
3. **Start with Docker Compose**
```bash
cd backend
docker-compose up -d
```
4. **Start the frontend**
```bash
cd frontend
npm install
npm run dev
```
### Option 2: Manual Setup
#### Backend Setup
1. **Create Python virtual environment**
```bash
cd backend
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
2. **Install dependencies**
```bash
pip install -r requirements.txt
```
3. **Set up environment variables**
```bash
cp .env.example .env
# Edit .env with your configuration
```
4. **Start MongoDB and Redis**
```bash
# MongoDB
brew services start mongodb/brew/mongodb-community
# Redis
brew services start redis
```
5. **Start the backend**
```bash
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
```
#### Frontend Setup
1. **Install dependencies**
```bash
cd frontend
npm install
```
2. **Set up environment variables**
```bash
cp .env.example .env
```
3. **Start the development server**
```bash
npm run dev
```
## 🔧 Configuration
### Backend Environment Variables
```env
# Database
MONGODB_URL=mongodb://localhost:27017
DATABASE_NAME=contract_analysis
# Redis
REDIS_URL=redis://localhost:6379
# Authentication
JWT_SECRET_KEY=your-super-secret-jwt-key
JWT_ALGORITHM=HS256
JWT_EXPIRE_MINUTES=30
# OpenAI
OPENAI_API_KEY=your-openai-api-key
LLAMAPARSE_API_KEY=your-llamaparse-api-key
# Application
DEBUG=false
CORS_ORIGINS=["http://localhost:3000"]
UPLOAD_DIR=./uploads
INDICES_DIR=./indices
# Cache
CACHE_ENABLED=true
CACHE_TTL=3600
```
### Frontend Environment Variables
```env
VITE_API_URL=http://localhost:8000
VITE_APP_NAME=Contract Analysis Tool
```
## 🚀 Usage
### Swagger Testing
Method 1: Get Token via Swagger UI (Easiest)
1. Go to http://localhost:8000/docs
2. First, initialize the default users by calling
the /api/v1/auth/init-users endpoint
3. Then use the /api/v1/auth/login endpoint with
these credentials:
Admin User:
- email: admin@oliver.agency
- password: admin123
Regular User:
- email: user@oliver.agency
- password: user123
4. Copy the access_token from the response
5. Click the "Authorize" button at the top of
Swagger UI
6. Enter: Bearer YOUR_TOKEN_HERE
### Initial Setup
1. **Access the application**: http://localhost:3000
2. **Initialize default users** (first time only):
```bash
curl -X POST http://localhost:8000/api/v1/auth/init-users
```
### Default Credentials
- **Admin**: `admin@oliver.agency` / `admin123`
- **User**: `user@oliver.agency` / `user123`
### Workflow
1. **Login** with admin or user credentials
2. **Create an Index** for your document collection
3. **Upload Documents** to the index
4. **Chat** with your documents using natural language
5. **Manage Users** (admin only)
## 📚 API Documentation
- **FastAPI Docs**: http://localhost:8000/docs (development only)
- **ReDoc**: http://localhost:8000/redoc (development only)
### Key Endpoints
- `POST /api/v1/auth/login` - User authentication
- `POST /api/v1/indices/create` - Create document index
- `POST /api/v1/documents/upload` - Upload documents
- `POST /api/v1/chat/query` - Query documents
- `GET /api/v1/admin/stats` - System statistics
## 🔒 Security Features
- **JWT Authentication** with role-based access
- **Input Validation** with Pydantic schemas
- **CORS Protection** for frontend integration
- **File Upload Validation** with type/size checks
- **Rate Limiting** (configurable)
- **Environment Variable Protection**
## ⚡ Performance Features
- **Async Processing** throughout the backend
- **Redis Caching** for API responses
- **Vector Search** with ChromaDB
- **Connection Pooling** for databases
- **Optimized Queries** with MongoDB indexes
## 🧪 Development
### Backend Development
```bash
cd backend
source venv/bin/activate
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
```
### Frontend Development
```bash
cd frontend
npm run dev
```
### Database Migration
The application automatically creates database collections and indexes on startup.
## 📊 Monitoring
### Health Check
```bash
curl http://localhost:8000/health
```
### System Stats (Admin)
```bash
curl -H "Authorization: Bearer <token>" http://localhost:8000/api/v1/admin/stats
```
## 🐛 Troubleshooting
### Common Issues
1. **MongoDB Connection Failed**
- Ensure MongoDB is running: `brew services start mongodb-community`
- Check connection string in `.env`
2. **Redis Connection Failed**
- Ensure Redis is running: `brew services start redis`
- Application will continue without caching if Redis is unavailable
3. **OpenAI API Errors**
- Verify API key in backend `.env`
- Check API quota and billing
4. **File Upload Issues**
- Check file size limits (50MB default)
- Verify file types are supported
- Ensure upload directory permissions
### Logs
- **Backend logs**: Console output from uvicorn
- **Frontend logs**: Browser console
- **Database logs**: MongoDB logs in data directory
## 🔄 Migration from v1.0
The new system provides complete feature parity with the original PHP application:
- ✅ All PHP functionality migrated to FastAPI
- ✅ SQLite data can be migrated to MongoDB
- ✅ Existing ChromaDB indices are compatible
- ✅ All document processing features preserved
- ✅ Enhanced performance and security
## 🚀 Deployment
### Production Deployment
1. **Set production environment variables**
2. **Use production database URLs**
3. **Enable HTTPS with SSL certificates**
4. **Configure reverse proxy (Nginx)**
5. **Set up monitoring and logging**
6. **Regular backups of MongoDB**
### Docker Production
```bash
docker-compose -f docker-compose.prod.yml up -d
```
## 🤝 Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request
## 📄 License
This project is licensed under the MIT License.
## 🙏 Acknowledgments
- **OpenAI** - GPT-4 and embedding models
- **LlamaIndex** - RAG framework
- **ChromaDB** - Vector storage
- **FastAPI** - Modern Python web framework
- **React** - Frontend framework
---
**Built with ❤️ for intelligent contract analysis**

25
backend/.env.example Normal file
View file

@ -0,0 +1,25 @@
# Database
MONGODB_URL=mongodb://localhost:27017
DATABASE_NAME=contract_analysis
# Redis
REDIS_URL=redis://localhost:6379
# Authentication
JWT_SECRET_KEY=your-super-secret-jwt-key-change-this-in-production
JWT_ALGORITHM=HS256
JWT_EXPIRE_MINUTES=30
# OpenAI
OPENAI_API_KEY=your-openai-api-key-here
LLAMAPARSE_API_KEY=your-llamaparse-api-key-here
# Application
DEBUG=false
CORS_ORIGINS=["http://localhost:3000"]
UPLOAD_DIR=./uploads
INDICES_DIR=./indices
# Cache
CACHE_ENABLED=true
CACHE_TTL=3600

25
backend/Dockerfile Normal file
View file

@ -0,0 +1,25 @@
FROM python:3.11-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
gcc \
g++ \
&& rm -rf /var/lib/apt/lists/*
# Copy requirements and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Create directories
RUN mkdir -p uploads indices
# Expose port
EXPOSE 8000
# Command to run the application
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

1
backend/app/__init__.py Normal file
View file

@ -0,0 +1 @@
# FastAPI Backend for Contract Analysis Tool

View file

@ -0,0 +1 @@
# API package

View file

@ -0,0 +1 @@
# API v1 package

699
backend/app/api/v1/admin.py Normal file
View file

@ -0,0 +1,699 @@
from fastapi import APIRouter, Depends, HTTPException, UploadFile, File, Form
from motor.motor_asyncio import AsyncIOMotorDatabase
from typing import List, Optional
from bson import ObjectId
from datetime import datetime
from urllib.parse import unquote
from ...config.database import get_database
from ...config.settings import settings
from ...core.auth import get_current_admin_user
from ...models.user import UserInDB, UserUpdate, UserRole
from ...models.document import DocumentInDB
from ...models.index import IndexInDB
from ...services.llama_processor import llama_processor
router = APIRouter()
@router.get("/users", response_model=List[dict])
async def get_all_users(
admin_user: UserInDB = Depends(get_current_admin_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Get all users (admin only)"""
users = []
cursor = db.users.find({})
async for user in cursor:
user_obj = UserInDB(**user)
users.append({
"id": str(user_obj.id),
"email": user_obj.email,
"role": user_obj.role,
"is_active": user_obj.is_active,
"index_access": user_obj.index_access,
"created_at": user_obj.created_at,
"updated_at": user_obj.updated_at
})
return users
@router.post("/users")
async def create_user(
user_data: dict,
admin_user: UserInDB = Depends(get_current_admin_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Create a new user (admin only)"""
try:
from ...core.security import get_password_hash
# Check if email already exists
existing_user = await db.users.find_one({"email": user_data["email"]})
if existing_user:
raise HTTPException(status_code=400, detail="Email already registered")
# Hash the password
hashed_password = get_password_hash(user_data["password"])
# Create user document
new_user = {
"email": user_data["email"],
"hashed_password": hashed_password,
"role": user_data.get("role", "user"),
"is_active": user_data.get("is_active", True),
"index_access": [],
"created_at": datetime.utcnow(),
"updated_at": datetime.utcnow()
}
# Insert user
result = await db.users.insert_one(new_user)
return {
"message": "User created successfully",
"user_id": str(result.inserted_id),
"email": user_data["email"]
}
except HTTPException:
raise
except Exception as e:
raise HTTPException(status_code=500, detail=f"Error creating user: {str(e)}")
@router.put("/users/{user_id}")
async def update_user(
user_id: str,
user_update: UserUpdate,
admin_user: UserInDB = Depends(get_current_admin_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Update a user (admin only)"""
# Check if user exists
user = await db.users.find_one({"_id": ObjectId(user_id)})
if not user:
raise HTTPException(status_code=404, detail="User not found")
# Prepare update data
update_data = {}
if user_update.email is not None:
# Check if email is already taken
existing = await db.users.find_one({
"email": user_update.email,
"_id": {"$ne": ObjectId(user_id)}
})
if existing:
raise HTTPException(status_code=400, detail="Email already in use")
update_data["email"] = user_update.email
if user_update.role is not None:
update_data["role"] = user_update.role
if user_update.is_active is not None:
update_data["is_active"] = user_update.is_active
if user_update.password is not None:
from ...core.security import get_password_hash
update_data["hashed_password"] = get_password_hash(user_update.password)
if update_data:
update_data["updated_at"] = datetime.utcnow()
await db.users.update_one(
{"_id": ObjectId(user_id)},
{"$set": update_data}
)
return {"message": "User updated successfully"}
@router.delete("/users/{user_id}")
async def delete_user(
user_id: str,
admin_user: UserInDB = Depends(get_current_admin_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Delete a user (admin only)"""
# Don't allow admin to delete themselves
if str(admin_user.id) == user_id:
raise HTTPException(status_code=400, detail="Cannot delete your own account")
# Check if user exists
user = await db.users.find_one({"_id": ObjectId(user_id)})
if not user:
raise HTTPException(status_code=404, detail="User not found")
# Delete user
await db.users.delete_one({"_id": ObjectId(user_id)})
return {"message": "User deleted successfully"}
@router.post("/users/{user_id}/grant-index-access")
async def grant_index_access(
user_id: str,
request_data: dict,
admin_user: UserInDB = Depends(get_current_admin_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Grant user access to an index"""
index_id = request_data.get("index_id")
if not index_id:
raise HTTPException(status_code=400, detail="index_id is required")
# Check if user exists
user = await db.users.find_one({"_id": ObjectId(user_id)})
if not user:
raise HTTPException(status_code=404, detail="User not found")
# Check if index exists
index = await db.indices.find_one({"index_id": index_id})
if not index:
raise HTTPException(status_code=404, detail="Index not found")
# Grant access
await db.users.update_one(
{"_id": ObjectId(user_id)},
{"$addToSet": {"index_access": index_id}}
)
return {"message": "Index access granted successfully"}
@router.post("/users/{user_id}/revoke-index-access")
async def revoke_index_access(
user_id: str,
request_data: dict,
admin_user: UserInDB = Depends(get_current_admin_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Revoke user access to an index"""
index_id = request_data.get("index_id")
if not index_id:
raise HTTPException(status_code=400, detail="index_id is required")
# Check if user exists
user = await db.users.find_one({"_id": ObjectId(user_id)})
if not user:
raise HTTPException(status_code=404, detail="User not found")
# Revoke access
await db.users.update_one(
{"_id": ObjectId(user_id)},
{"$pull": {"index_access": index_id}}
)
return {"message": "Index access revoked successfully"}
@router.post("/grant-all-indices/{user_id}")
async def grant_all_indices_access(
user_id: str,
admin_user: UserInDB = Depends(get_current_admin_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Grant user access to all indices"""
# Check if user exists
user = await db.users.find_one({"_id": ObjectId(user_id)})
if not user:
raise HTTPException(status_code=404, detail="User not found")
# Get all active indices
indices = []
cursor = db.indices.find({"status": "active"})
async for index in cursor:
indices.append(index["index_id"])
# Grant access to all indices
await db.users.update_one(
{"_id": ObjectId(user_id)},
{"$set": {"index_access": indices}}
)
return {
"message": "Access granted to all indices",
"index_count": len(indices)
}
@router.get("/stats")
async def get_system_stats(
admin_user: UserInDB = Depends(get_current_admin_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Get system statistics"""
# Count users
total_users = await db.users.count_documents({})
active_users = await db.users.count_documents({"is_active": True})
admin_users = await db.users.count_documents({"role": UserRole.ADMIN})
# Count indices
total_indices = await db.indices.count_documents({"status": "active"})
# Count documents
total_documents = await db.documents.count_documents({})
pending_documents = await db.documents.count_documents({"processing_status": "pending"})
processing_documents = await db.documents.count_documents({"processing_status": "processing"})
completed_documents = await db.documents.count_documents({"processing_status": "completed"})
failed_documents = await db.documents.count_documents({"processing_status": "failed"})
# Count chat messages
total_messages = await db.chat_messages.count_documents({})
return {
"users": {
"total": total_users,
"active": active_users,
"admins": admin_users
},
"indices": {
"total": total_indices
},
"documents": {
"total": total_documents,
"pending": pending_documents,
"processing": processing_documents,
"completed": completed_documents,
"failed": failed_documents
},
"chat_messages": {
"total": total_messages
}
}
@router.post("/documents/upload-single")
async def upload_single_document(
file: UploadFile = File(...),
index_id: str = Form(...),
custom_name: Optional[str] = Form(None),
admin_user: UserInDB = Depends(get_current_admin_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Upload a single document for processing (admin only)"""
# Verify index exists
index = await db.indices.find_one({"index_id": index_id})
if not index:
raise HTTPException(status_code=404, detail="Index not found")
# Process document
document = await llama_processor.process_single_file(
file, index_id, admin_user, db, custom_name
)
return {
"message": "Document uploaded and processing started",
"document_id": str(document.id),
"filename": document.original_filename,
"status": document.processing_status
}
@router.post("/documents/upload-multiple")
async def upload_multiple_documents(
files: List[UploadFile] = File(...),
index_id: str = Form(...),
base_name: str = Form(...),
admin_user: UserInDB = Depends(get_current_admin_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Upload multiple documents for processing (admin only)"""
# Verify index exists
index = await db.indices.find_one({"index_id": index_id})
if not index:
raise HTTPException(status_code=404, detail="Index not found")
# Process documents
documents = await llama_processor.process_multiple_files(
files, index_id, admin_user, db, base_name
)
return {
"message": f"Successfully uploaded {len(documents)} documents",
"documents": [
{
"id": str(doc.id),
"filename": doc.original_filename,
"status": doc.processing_status
} for doc in documents
]
}
@router.get("/documents/processing-status")
async def get_processing_status(
admin_user: UserInDB = Depends(get_current_admin_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Get overall processing status (admin only)"""
# Get counts for each status
statuses = {
"pending": 0,
"processing": 0,
"completed": 0,
"failed": 0
}
# Count processing status
for status in statuses.keys():
statuses[status] = await db.documents.count_documents({
"processing_status": status
})
# Count embedding status
embedding_statuses = {}
for status in statuses.keys():
embedding_statuses[status] = await db.documents.count_documents({
"embedding_status": status
})
# Get documents with unclear status
unclear_docs = await db.documents.count_documents({
"$or": [
{"processing_status": {"$exists": False}},
{"embedding_status": {"$exists": False}}
]
})
return {
"processing_status": statuses,
"embedding_status": embedding_statuses,
"unclear_status_count": unclear_docs,
"total_documents": sum(statuses.values())
}
@router.get("/documents/{index_id}")
async def get_index_documents(
index_id: str,
admin_user: UserInDB = Depends(get_current_admin_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Get all documents for an index (admin only)"""
# Decode URL-encoded index_id
decoded_index_id = unquote(index_id)
# Verify index exists
index = await db.indices.find_one({"index_id": decoded_index_id})
if not index:
raise HTTPException(status_code=404, detail="Index not found")
# Get documents
documents = []
cursor = db.documents.find({"index_id": decoded_index_id})
async for doc in cursor:
documents.append({
"id": str(doc["_id"]),
"filename": doc["original_filename"],
"file_size": doc["file_size"],
"processing_status": doc["processing_status"],
"embedding_status": doc["embedding_status"],
"chunk_count": doc.get("chunk_count", 0),
"created_at": doc["created_at"],
"updated_at": doc["updated_at"]
})
return {
"index_id": decoded_index_id,
"index_name": index["name"],
"documents": documents,
"total_documents": len(documents)
}
@router.post("/documents/{document_id}/reprocess")
async def reprocess_document(
document_id: str,
admin_user: UserInDB = Depends(get_current_admin_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Reprocess a document (admin only)"""
# Get document
document = await db.documents.find_one({"_id": ObjectId(document_id)})
if not document:
raise HTTPException(status_code=404, detail="Document not found")
# Reset processing status
await db.documents.update_one(
{"_id": ObjectId(document_id)},
{"$set": {
"processing_status": "pending",
"embedding_status": "pending",
"parsed_text": None,
"text_chunks": None,
"chunk_count": 0,
"vector_ids": None,
"updated_at": datetime.utcnow()
}}
)
# Create document object for reprocessing
doc_obj = DocumentInDB(**document)
# Start reprocessing
import asyncio
asyncio.create_task(llama_processor._process_document_async(doc_obj, db))
return {
"message": "Document reprocessing started",
"document_id": document_id,
"status": "pending"
}
@router.delete("/documents/{document_id}")
async def delete_document(
document_id: str,
admin_user: UserInDB = Depends(get_current_admin_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Delete a document and its embeddings (admin only)"""
# Get document
document = await db.documents.find_one({"_id": ObjectId(document_id)})
if not document:
raise HTTPException(status_code=404, detail="Document not found")
# Use the enhanced document processor for complete cleanup
from ...services.document_processor import document_processor
success = await document_processor.delete_document(document_id, db)
if not success:
raise HTTPException(status_code=500, detail="Failed to delete document")
return {
"message": "Document deleted successfully",
"document_id": document_id
}
@router.post("/chat/query")
async def admin_chat_query(
query: str = Form(...),
index_id: str = Form(...),
top_k: int = Form(5),
admin_user: UserInDB = Depends(get_current_admin_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Query documents using RAG (admin only)"""
# Verify index exists
index = await db.indices.find_one({"index_id": index_id})
if not index:
raise HTTPException(status_code=404, detail="Index not found")
try:
# Query vector store
results = await llama_processor.query_documents(
query, index_id, top_k
)
# Extract context chunks
context_chunks = [result["content"] for result in results]
# Generate response
response = await llama_processor.generate_response(
query, context_chunks, index_id
)
return {
"query": query,
"response": response,
"sources": results,
"index_id": index_id
}
except Exception as e:
raise HTTPException(
status_code=500,
detail=f"Error processing query: {str(e)}"
)
@router.get("/indices")
async def get_all_indices(
admin_user: UserInDB = Depends(get_current_admin_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Get all indices (admin only)"""
indices = []
cursor = db.indices.find({"status": "active"})
async for index in cursor:
# Count documents for this index
doc_count = await db.documents.count_documents({"index_id": index["index_id"]})
indices.append({
"id": str(index["_id"]),
"index_id": index["index_id"],
"name": index["name"],
"description": index.get("description", ""),
"document_count": doc_count,
"created_by": str(index["created_by"]),
"created_at": index["created_at"],
"chunk_size": index.get("chunk_size", 1000),
"chunk_overlap": index.get("chunk_overlap", 200)
})
return {
"indices": indices,
"total": len(indices)
}
@router.post("/indices/create")
async def create_index(
name: str = Form(...),
description: Optional[str] = Form(None),
chunk_size: int = Form(1000),
chunk_overlap: int = Form(200),
admin_user: UserInDB = Depends(get_current_admin_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Create a new index (admin only)"""
import uuid
# Generate unique index ID
index_id = str(uuid.uuid4())
# Create index record
index_data = {
"index_id": index_id,
"name": name,
"description": description,
"created_by": admin_user.id,
"created_at": datetime.utcnow(),
"updated_at": datetime.utcnow(),
"status": "active",
"document_count": 0,
"chunk_size": chunk_size,
"chunk_overlap": chunk_overlap,
"embedding_model": "text-embedding-ada-002",
"settings": {}
}
# Save to database
result = await db.indices.insert_one(index_data)
return {
"message": "Index created successfully",
"index_id": index_id,
"name": name,
"id": str(result.inserted_id)
}
@router.delete("/indices/{index_id}")
async def delete_index(
index_id: str,
admin_user: UserInDB = Depends(get_current_admin_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Delete an index and all its documents (admin only)"""
# Decode URL-encoded index_id
decoded_index_id = unquote(index_id)
# Check if index exists
index = await db.indices.find_one({"index_id": decoded_index_id})
if not index:
raise HTTPException(status_code=404, detail="Index not found")
try:
# Get all documents for this index
documents_cursor = db.documents.find({"index_id": decoded_index_id})
document_count = 0
async for doc in documents_cursor:
document_count += 1
# Delete embeddings from vector store
await llama_processor.delete_document_embeddings(
str(doc["_id"]), decoded_index_id
)
# Delete file from filesystem
from pathlib import Path
file_path = Path(doc["file_path"])
if file_path.exists():
try:
file_path.unlink()
except Exception as e:
print(f"Error deleting file {file_path}: {e}")
# Delete all documents for this index
await db.documents.delete_many({"index_id": decoded_index_id})
# Delete all chat messages for this index
chat_result = await db.chat_messages.delete_many({"index_id": decoded_index_id})
# Delete the index record
await db.indices.delete_one({"index_id": decoded_index_id})
# Use complete ChromaDB cleanup instead of manual deletion
from ...services.rag_service import rag_service
cleanup_result = await rag_service.delete_index_complete(decoded_index_id)
if not cleanup_result["success"]:
print(f"Warning during complete index cleanup: {cleanup_result['message']}")
# Note: Cache clearing removed - caching is disabled for data freshness
# Delete index upload directory
from pathlib import Path
index_dir = Path(settings.upload_dir) / decoded_index_id
if index_dir.exists():
try:
import shutil
shutil.rmtree(index_dir)
except Exception as e:
print(f"Error deleting index directory {index_dir}: {e}")
return {
"message": "Index deleted successfully",
"index_id": decoded_index_id,
"documents_deleted": document_count,
"chat_messages_deleted": chat_result.deleted_count
}
except Exception as e:
raise HTTPException(
status_code=500,
detail=f"Error deleting index: {str(e)}"
)
@router.post("/documents/process-pending")
async def process_pending_documents(
admin_user: UserInDB = Depends(get_current_admin_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Process all pending documents (admin only)"""
import asyncio
# Find all documents that are pending or failed
cursor = db.documents.find({
"$or": [
{"processing_status": "pending"},
{"processing_status": "failed"},
{"embedding_status": "pending"},
{"embedding_status": "failed"}
]
})
processed_count = 0
async for doc in cursor:
try:
document = DocumentInDB(**doc)
# Start processing
asyncio.create_task(llama_processor._process_document_async(document, db))
processed_count += 1
except Exception as e:
print(f"Error queueing document {doc['_id']} for processing: {e}")
return {
"message": f"Queued {processed_count} documents for processing",
"count": processed_count
}

303
backend/app/api/v1/auth.py Normal file
View file

@ -0,0 +1,303 @@
from fastapi import APIRouter, Depends, HTTPException, status # type: ignore
from fastapi.security import HTTPBearer # type: ignore
from motor.motor_asyncio import AsyncIOMotorDatabase # type: ignore
from pydantic import BaseModel # type: ignore
from datetime import timedelta
from bson import ObjectId # type: ignore
import logging
from ...config.database import get_database
from ...config.settings import settings
from ...core.security import verify_password, get_password_hash, create_access_token
from ...models.user import UserInDB, UserCreate, UserRole, UserResponse, AuthMethod
from ...core.auth import get_current_active_user
from ...services.sso_service import sso_service
router = APIRouter()
logger = logging.getLogger(__name__)
class LoginRequest(BaseModel):
email: str
password: str
class LoginResponse(BaseModel):
access_token: str
token_type: str
user: dict
class RegisterRequest(BaseModel):
email: str
password: str
role: UserRole = UserRole.USER
class SSOLoginRequest(BaseModel):
access_token: str
class SSOConfigResponse(BaseModel):
client_id: str
authority: str
redirect_uri: str
enabled: bool
@router.post("/login", response_model=LoginResponse)
async def login(
login_data: LoginRequest,
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Authenticate user with local credentials and return access token"""
# Find user by email
user = await db.users.find_one({"email": login_data.email})
if not user:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Incorrect email or password",
headers={"WWW-Authenticate": "Bearer"},
)
user_obj = UserInDB(**user)
# Check if user has a local password (for local auth)
if not user_obj.hashed_password:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="User account requires SSO authentication",
headers={"WWW-Authenticate": "Bearer"},
)
# Verify password
if not verify_password(login_data.password, user_obj.hashed_password):
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Incorrect email or password",
headers={"WWW-Authenticate": "Bearer"},
)
# Check if user is active
if not user_obj.is_active:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Inactive user"
)
# Create access token
access_token = create_access_token(data={"sub": str(user_obj.id)})
return LoginResponse(
access_token=access_token,
token_type="bearer",
user={
"id": str(user_obj.id),
"email": user_obj.email,
"role": user_obj.role,
"is_active": user_obj.is_active,
"index_access": user_obj.index_access
}
)
@router.post("/register", response_model=dict)
async def register(
register_data: RegisterRequest,
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Register a new user"""
# Check if user already exists
existing_user = await db.users.find_one({"email": register_data.email})
if existing_user:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="User with this email already exists"
)
# Create new user
hashed_password = get_password_hash(register_data.password)
user_data = UserCreate(
email=register_data.email,
password=register_data.password,
role=register_data.role
)
user_dict = user_data.dict()
user_dict["hashed_password"] = hashed_password
del user_dict["password"]
user_dict["index_access"] = []
# Insert user into database
result = await db.users.insert_one(user_dict)
return {"message": "User registered successfully", "user_id": str(result.inserted_id)}
@router.get("/me", response_model=UserResponse)
async def get_current_user_info(
current_user: UserInDB = Depends(get_current_active_user)
):
"""Get current user information"""
return UserResponse(
_id=current_user.id,
email=current_user.email,
role=current_user.role,
is_active=current_user.is_active,
index_access=current_user.index_access,
auth_method=current_user.auth_method,
sso_provider=current_user.sso_provider,
sso_name=current_user.sso_name,
last_sso_login=current_user.last_sso_login,
created_at=current_user.created_at,
updated_at=current_user.updated_at
)
@router.post("/refresh", response_model=LoginResponse)
async def refresh_token(
current_user: UserInDB = Depends(get_current_active_user)
):
"""Refresh access token for active user"""
# Create new access token
access_token = create_access_token(data={"sub": str(current_user.id)})
return LoginResponse(
access_token=access_token,
token_type="bearer",
user={
"id": str(current_user.id),
"email": current_user.email,
"role": current_user.role,
"is_active": current_user.is_active,
"auth_method": current_user.auth_method,
"sso_provider": current_user.sso_provider,
"sso_name": current_user.sso_name,
"index_access": current_user.index_access
}
)
@router.post("/logout")
async def logout():
"""Logout user (client should discard token)"""
return {"message": "Logged out successfully"}
@router.get("/sso/config", response_model=SSOConfigResponse)
async def get_sso_config():
"""Get SSO configuration for frontend"""
if not settings.sso_enabled:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="SSO is not enabled"
)
if not all([settings.azure_client_id, settings.azure_authority, settings.azure_redirect_uri]):
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="SSO is not properly configured"
)
return SSOConfigResponse(
client_id=settings.azure_client_id,
authority=settings.azure_authority,
redirect_uri=settings.azure_redirect_uri,
enabled=settings.sso_enabled
)
@router.post("/sso/validate", response_model=LoginResponse)
async def sso_login(sso_data: SSOLoginRequest):
"""Validate SSO token and authenticate user"""
logger.info("=== SSO Login Request ===")
logger.info(f"SSO enabled: {settings.sso_enabled}")
logger.info(f"Token length: {len(sso_data.access_token) if sso_data.access_token else 'None'}")
if not settings.sso_enabled:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="SSO is not enabled"
)
try:
logger.info("Starting SSO token processing...")
# Process SSO login using the service
user = await sso_service.process_sso_login(sso_data.access_token)
logger.info(f"SSO processing successful, user: {user.email}")
# Create our internal JWT token
access_token = create_access_token(data={"sub": str(user.id)})
logger.info("Internal JWT token created successfully")
return LoginResponse(
access_token=access_token,
token_type="bearer",
user={
"id": str(user.id),
"email": user.email,
"role": user.role,
"is_active": user.is_active,
"auth_method": user.auth_method,
"sso_provider": user.sso_provider,
"sso_name": user.sso_name,
"index_access": user.index_access
}
)
except HTTPException:
raise
except Exception as e:
logger.error(f"SSO authentication failed: {str(e)}", exc_info=True)
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"SSO authentication failed: {str(e)}"
)
@router.post("/login/local", response_model=LoginResponse)
async def local_login(
login_data: LoginRequest,
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Explicit local authentication (backup admin login)"""
if not settings.allow_local_admin:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Local authentication is disabled"
)
# Only allow admin@oliver.agency for local backup
if login_data.email != "admin@oliver.agency":
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Local authentication only available for admin account"
)
# Use the same logic as regular login
return await login(login_data, db)
# Initialize default users
@router.post("/init-users")
async def init_default_users(
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Initialize default users (admin and user)"""
# Check if admin user exists
admin_exists = await db.users.find_one({"email": "admin@oliver.agency"})
if not admin_exists:
admin_user = {
"email": "admin@oliver.agency",
"hashed_password": get_password_hash("admin123"),
"role": UserRole.ADMIN,
"is_active": True,
"auth_method": AuthMethod.LOCAL,
"index_access": [],
"created_at": None,
"updated_at": None
}
await db.users.insert_one(admin_user)
# Check if regular user exists
user_exists = await db.users.find_one({"email": "user@oliver.agency"})
if not user_exists:
regular_user = {
"email": "user@oliver.agency",
"hashed_password": get_password_hash("user123"),
"role": UserRole.USER,
"is_active": True,
"auth_method": AuthMethod.LOCAL,
"index_access": [],
"created_at": None,
"updated_at": None
}
await db.users.insert_one(regular_user)
return {"message": "Default users initialized"}

406
backend/app/api/v1/chat.py Normal file
View file

@ -0,0 +1,406 @@
from fastapi import APIRouter, Depends, HTTPException
from motor.motor_asyncio import AsyncIOMotorDatabase
from typing import Dict, Any, List
import time
from datetime import datetime
from bson import ObjectId
from urllib.parse import unquote
from ...config.database import get_database
from ...core.auth import get_current_active_user
# Cache import removed - caching disabled for data freshness
from ...models.user import UserInDB
from ...models.chat import ChatQuery, ChatResponse, ChatMessageCreate, ChatMessageInDB
from ...services.rag_service import rag_service
from ...services.llama_processor import llama_processor
from ...services.chat_context_service import chat_context_service
router = APIRouter()
@router.post("/query", response_model=ChatResponse)
async def chat_query(
query_data: ChatQuery,
current_user: UserInDB = Depends(get_current_active_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Process a chat query against a document index"""
start_time = time.time()
# Check if user has access to this index
if current_user.role.value != "admin" and query_data.index_id not in current_user.index_access:
raise HTTPException(status_code=403, detail="Access denied to this index")
# Check if index exists in database
index = await db.indices.find_one({"index_id": query_data.index_id})
if not index:
raise HTTPException(
status_code=404,
detail=f"Index '{query_data.index_id}' not found"
)
# Check if index has processed documents
processed_docs = await db.documents.count_documents({
"index_id": query_data.index_id,
"processing_status": "completed",
"embedding_status": "completed"
})
# If no completed documents, check for any documents at all
if processed_docs == 0:
total_docs = await db.documents.count_documents({"index_id": query_data.index_id})
if total_docs == 0:
raise HTTPException(
status_code=400,
detail=f"Index '{query_data.index_id}' has no documents. Please upload documents first."
)
else:
# Check processing status
processing_docs = await db.documents.count_documents({
"index_id": query_data.index_id,
"$or": [
{"processing_status": "processing"},
{"embedding_status": "processing"},
{"processing_status": "pending"},
{"embedding_status": "pending"}
]
})
failed_docs = await db.documents.count_documents({
"index_id": query_data.index_id,
"$or": [
{"processing_status": "failed"},
{"embedding_status": "failed"}
]
})
if processing_docs > 0:
raise HTTPException(
status_code=400,
detail=f"Index '{query_data.index_id}' has {processing_docs} documents still processing. Please wait for processing to complete."
)
elif failed_docs > 0:
raise HTTPException(
status_code=400,
detail=f"Index '{query_data.index_id}' has {failed_docs} documents that failed to process. Please check the admin panel and reprocess the documents."
)
else:
# Documents exist but status is unclear, check if any have parsed text
docs_with_text = await db.documents.count_documents({
"index_id": query_data.index_id,
"parsed_text": {"$exists": True, "$ne": None, "$ne": ""}
})
if docs_with_text > 0:
print(f"Warning: Index {query_data.index_id} has documents with unclear processing status but {docs_with_text} have parsed text")
# Continue with the query attempt
else:
raise HTTPException(
status_code=400,
detail=f"Index '{query_data.index_id}' has documents but none have been processed successfully. Please check the admin panel."
)
# Query vector store for relevant chunks
try:
vector_results = await llama_processor.query_documents(
query_data.query, query_data.index_id, top_k=10
)
# Extract context chunks
context_chunks = [result["content"] for result in vector_results]
# Generate contextual response with conversation history
ai_result = await chat_context_service.generate_contextual_response(
query_data.query,
query_data.index_id,
str(current_user.id),
db,
context_chunks
)
result = {
"success": True,
"response": ai_result["response"],
"sources": vector_results,
"context_used": ai_result.get("context_used"),
"context_messages_count": ai_result.get("context_messages_count", 0)
}
except Exception as e:
print(f"Error in chat query: {e}")
# Handle specific ChromaDB errors more gracefully
if "does not exist" in str(e) or "Collection" in str(e):
# Check document count again
total_docs = await db.documents.count_documents({"index_id": query_data.index_id})
if total_docs == 0:
raise HTTPException(
status_code=404,
detail=f"No documents found in index '{query_data.index_id}'. Please upload documents first."
)
else:
raise HTTPException(
status_code=404,
detail=f"Vector database not ready for index '{query_data.index_id}'. The documents may still be processing. Please wait and try again."
)
else:
raise HTTPException(
status_code=500,
detail=f"Error processing query: {str(e)}"
)
# Result is always successful at this point since we handle errors above
# Prepare response
response_time = time.time() - start_time
debug_info = {
"sources": result.get("sources", []),
"context_used": result.get("context_used"),
"context_messages_count": result.get("context_messages_count", 0),
"cached": False,
"response_time": response_time
}
response = ChatResponse(
response=result["response"],
debug_info=debug_info,
cached=False,
response_time=response_time
)
# Save chat message to database
await _save_chat_message(
current_user.id, query_data, response, db
)
return response
@router.get("/history/{index_id}")
async def get_chat_history(
index_id: str,
limit: int = 50,
current_user: UserInDB = Depends(get_current_active_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Get chat history for a specific index"""
# Decode URL-encoded index_id
decoded_index_id = unquote(index_id)
# Check if user has access to this index
if current_user.role.value != "admin" and decoded_index_id not in current_user.index_access:
raise HTTPException(status_code=403, detail="Access denied to this index")
# Get chat messages in chronological order (oldest first), excluding soft-deleted
cursor = db.chat_messages.find({
"user_id": current_user.id,
"index_id": decoded_index_id,
"deleted_by_user": {"$ne": True}
}).sort("created_at", 1).limit(limit)
messages = []
async for msg in cursor:
message = ChatMessageInDB(**msg)
# Use separate timestamps if available, otherwise use created_at
user_time = msg.get("user_timestamp", message.created_at)
assistant_time = msg.get("assistant_timestamp", message.created_at)
messages.append({
"id": str(message.id),
"query": message.query,
"response": message.response,
"created_at": message.created_at,
"user_timestamp": user_time,
"assistant_timestamp": assistant_time,
"response_time": message.response_time,
"cached": message.cached,
"sources": message.sources,
"context_used": message.context_used
})
return {"messages": messages}
@router.delete("/history/{index_id}")
async def clear_chat_history(
index_id: str,
current_user: UserInDB = Depends(get_current_active_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Clear chat history for a specific index (soft delete)"""
# Decode URL-encoded index_id
decoded_index_id = unquote(index_id)
# Check if user has access to this index
if current_user.role.value != "admin" and decoded_index_id not in current_user.index_access:
raise HTTPException(status_code=403, detail="Access denied to this index")
# Soft delete chat messages by marking them as deleted
result = await db.chat_messages.update_many(
{
"user_id": current_user.id,
"index_id": decoded_index_id,
"deleted_by_user": {"$ne": True}
},
{
"$set": {
"deleted_by_user": True,
"updated_at": datetime.utcnow()
}
}
)
# Note: Cache clearing removed - caching is disabled for data freshness
return {"message": f"Cleared {result.modified_count} chat messages"}
@router.get("/context/{index_id}")
async def get_conversation_context(
index_id: str,
limit: int = 5,
current_user: UserInDB = Depends(get_current_active_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Get conversation context for debugging/display"""
# Decode URL-encoded index_id
decoded_index_id = unquote(index_id)
# Check if user has access to this index
if current_user.role.value != "admin" and decoded_index_id not in current_user.index_access:
raise HTTPException(status_code=403, detail="Access denied to this index")
# Get conversation context
context_messages = await chat_context_service.get_conversation_context(
str(current_user.id), decoded_index_id, db, limit
)
return {
"context_messages": context_messages,
"count": len(context_messages),
"formatted_context": chat_context_service.format_context_for_ai(context_messages)
}
@router.get("/index-status/{index_id}")
async def get_index_chat_status(
index_id: str,
current_user: UserInDB = Depends(get_current_active_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Check if an index is ready for chat queries"""
# Decode URL-encoded index_id
decoded_index_id = unquote(index_id)
# Check if user has access to this index
if current_user.role.value != "admin" and decoded_index_id not in current_user.index_access:
raise HTTPException(status_code=403, detail="Access denied to this index")
# Check if index exists in database
index = await db.indices.find_one({"index_id": decoded_index_id})
if not index:
return {
"ready": False,
"reason": "Index not found",
"details": {
"index_exists": False,
"total_documents": 0,
"processed_documents": 0,
"failed_documents": 0,
"processing_documents": 0
}
}
# Get document statistics
total_docs = await db.documents.count_documents({"index_id": decoded_index_id})
processed_docs = await db.documents.count_documents({
"index_id": decoded_index_id,
"processing_status": "completed",
"embedding_status": "completed"
})
failed_docs = await db.documents.count_documents({
"index_id": decoded_index_id,
"$or": [
{"processing_status": "failed"},
{"embedding_status": "failed"}
]
})
processing_docs = await db.documents.count_documents({
"index_id": decoded_index_id,
"$or": [
{"processing_status": "processing"},
{"embedding_status": "processing"},
{"processing_status": "pending"},
{"embedding_status": "pending"}
]
})
# Check ChromaDB collection
collection_info = llama_processor.get_collection_info(decoded_index_id)
# Determine if ready
ready = processed_docs > 0 and collection_info["exists"] and collection_info["count"] > 0
reason = ""
if total_docs == 0:
reason = "No documents uploaded"
elif processed_docs == 0:
if processing_docs > 0:
reason = f"{processing_docs} documents still processing"
elif failed_docs > 0:
reason = f"All {failed_docs} documents failed to process"
else:
reason = "No documents have been processed successfully"
elif not collection_info["exists"]:
reason = "Vector database collection not found"
elif collection_info["count"] == 0:
reason = "Vector database collection is empty"
return {
"ready": ready,
"reason": reason if not ready else "Index ready for queries",
"details": {
"index_exists": True,
"index_name": index["name"],
"total_documents": total_docs,
"processed_documents": processed_docs,
"failed_documents": failed_docs,
"processing_documents": processing_docs,
"collection_exists": collection_info["exists"],
"collection_count": collection_info.get("count", 0),
"collection_error": collection_info.get("error")
}
}
async def _save_chat_message(
user_id,
query_data: ChatQuery,
response: ChatResponse,
db: AsyncIOMotorDatabase
):
"""Save chat message to database with proper timestamp"""
try:
current_time = datetime.utcnow()
message_data = ChatMessageCreate(
user_id=user_id,
index_id=query_data.index_id,
query=query_data.query,
response=response.response,
created_at=current_time,
updated_at=current_time
)
message_dict = message_data.dict()
message_dict["debug_info"] = response.debug_info
message_dict["response_time"] = response.response_time
message_dict["cached"] = response.cached
message_dict["sources"] = response.debug_info.get("sources", [])
message_dict["context_used"] = response.debug_info.get("context_used")
message_dict["created_at"] = current_time
message_dict["updated_at"] = current_time
# Add separate timestamps for user message and assistant response
message_dict["user_timestamp"] = current_time
message_dict["assistant_timestamp"] = current_time
message_dict["deleted_by_user"] = False
await db.chat_messages.insert_one(message_dict)
except Exception as e:
print(f"Error saving chat message: {e}")
# Don't fail the request if we can't save the message

View file

@ -0,0 +1,323 @@
from fastapi import APIRouter, Depends, HTTPException, UploadFile, File, Form
from fastapi.responses import FileResponse
from motor.motor_asyncio import AsyncIOMotorDatabase
from typing import List
from bson import ObjectId
from datetime import datetime
import os
import asyncio
from urllib.parse import unquote
from ...config.database import get_database
from ...core.auth import get_current_active_user, require_index_access
from ...models.user import UserInDB
from ...models.document import Document, DocumentInDB
from ...models.contract_summary import ContractSummaryResponse
from ...services.document_processor import document_processor
from ...services.rag_service import rag_service
from ...services.llama_processor import llama_processor
router = APIRouter()
@router.post("/upload", response_model=dict)
async def upload_document(
file: UploadFile = File(...),
index_id: str = Form(...),
current_user: UserInDB = Depends(get_current_active_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Upload a document to an index"""
# Check if user has access to this index
if current_user.role.value != "admin" and index_id not in current_user.index_access:
raise HTTPException(status_code=403, detail="Access denied to this index")
# Process the upload using LlamaProcessor (includes contract summary processing)
document = await llama_processor.process_single_file(file, index_id, current_user, db)
return {
"message": "Document uploaded successfully",
"document_id": str(document.id),
"filename": document.filename,
"processing_status": "pending"
}
# Background processing is now handled by LlamaProcessor automatically
@router.get("/index/{index_id}", response_model=List[dict])
async def get_documents_by_index(
index_id: str,
current_user: UserInDB = Depends(get_current_active_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Get all documents for a specific index"""
# Decode URL-encoded index_id
decoded_index_id = unquote(index_id)
# Check if user has access to this index
if current_user.role.value != "admin" and decoded_index_id not in current_user.index_access:
raise HTTPException(status_code=403, detail="Access denied to this index")
documents = await document_processor.get_documents_by_index(decoded_index_id, db)
return [
{
"id": str(doc.id),
"filename": doc.filename,
"original_filename": doc.original_filename,
"file_size": doc.file_size,
"content_type": doc.content_type,
"processing_status": doc.processing_status,
"embedding_status": doc.embedding_status,
"summary_status": getattr(doc, 'summary_status', 'pending'),
"created_at": doc.created_at,
"updated_at": doc.updated_at,
"metadata": doc.metadata,
"chunk_count": doc.chunk_count
}
for doc in documents
]
@router.get("/{document_id}/download")
async def download_document(
document_id: str,
current_user: UserInDB = Depends(get_current_active_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Download a document"""
document = await db.documents.find_one({"_id": ObjectId(document_id)})
if not document:
raise HTTPException(status_code=404, detail="Document not found")
doc = DocumentInDB(**document)
# Check if user has access to this document's index
if current_user.role.value != "admin" and doc.index_id not in current_user.index_access:
raise HTTPException(status_code=403, detail="Access denied to this document")
# Check if file exists
if not os.path.exists(doc.file_path):
raise HTTPException(status_code=404, detail="File not found on disk")
return FileResponse(
path=doc.file_path,
filename=doc.original_filename,
media_type=doc.content_type
)
@router.get("/{document_id}/text")
async def get_document_text(
document_id: str,
current_user: UserInDB = Depends(get_current_active_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Get the parsed text content of a document"""
print(f"🔍 DEBUG - Getting document text for ID: {document_id}")
document = await db.documents.find_one({"_id": ObjectId(document_id)})
if not document:
print(f"❌ DEBUG - Document not found: {document_id}")
raise HTTPException(status_code=404, detail="Document not found")
print(f"✅ DEBUG - Document found: {document.get('original_filename', 'Unknown')}")
doc = DocumentInDB(**document)
# Check if user has access to this document's index
if current_user.role.value != "admin" and doc.index_id not in current_user.index_access:
print(f"❌ DEBUG - Access denied to index: {doc.index_id}")
raise HTTPException(status_code=403, detail="Access denied to this document")
# Check if document has been processed
print(f"📊 DEBUG - Processing status: {doc.processing_status}")
if doc.processing_status != "completed":
raise HTTPException(status_code=400, detail="Document not yet processed")
# Check if parsed text exists
parsed_text = getattr(doc, 'parsed_text', None)
print(f"📝 DEBUG - Parsed text length: {len(parsed_text) if parsed_text else 0}")
if not parsed_text:
raise HTTPException(status_code=404, detail="Document text not available")
print(f"✅ DEBUG - Returning document text successfully")
return {
"success": True,
"document_id": str(doc.id),
"filename": doc.original_filename,
"text": parsed_text,
"text_length": len(parsed_text),
"processing_status": doc.processing_status,
"created_at": doc.created_at,
"updated_at": doc.updated_at
}
@router.get("/{document_id}", response_model=dict)
async def get_document(
document_id: str,
current_user: UserInDB = Depends(get_current_active_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Get a specific document"""
document = await db.documents.find_one({"_id": ObjectId(document_id)})
if not document:
raise HTTPException(status_code=404, detail="Document not found")
doc = DocumentInDB(**document)
# Check if user has access to this document's index
if current_user.role.value != "admin" and doc.index_id not in current_user.index_access:
raise HTTPException(status_code=403, detail="Access denied to this document")
return {
"id": str(doc.id),
"filename": doc.filename,
"original_filename": doc.original_filename,
"file_size": doc.file_size,
"content_type": doc.content_type,
"index_id": doc.index_id,
"processing_status": doc.processing_status,
"created_at": doc.created_at,
"updated_at": doc.updated_at,
"metadata": doc.metadata
}
@router.delete("/{document_id}")
async def delete_document(
document_id: str,
current_user: UserInDB = Depends(get_current_active_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Delete a document"""
document = await db.documents.find_one({"_id": ObjectId(document_id)})
if not document:
raise HTTPException(status_code=404, detail="Document not found")
doc = DocumentInDB(**document)
# Only admins can delete documents
if current_user.role.value != "admin":
raise HTTPException(status_code=403, detail="Only administrators can delete documents")
# Delete the document
success = await document_processor.delete_document(document_id, db)
if success:
return {"message": "Document deleted successfully"}
else:
raise HTTPException(status_code=500, detail="Failed to delete document")
@router.get("/{document_id}/summary", response_model=ContractSummaryResponse)
async def get_document_summary(
document_id: str,
current_user: UserInDB = Depends(get_current_active_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Get structured contract summary of a document"""
document = await db.documents.find_one({"_id": ObjectId(document_id)})
if not document:
raise HTTPException(status_code=404, detail="Document not found")
doc = DocumentInDB(**document)
# Check if user has access to this document's index
if current_user.role.value != "admin" and doc.index_id not in current_user.index_access:
raise HTTPException(status_code=403, detail="Access denied to this document")
# Check if document has been processed
if doc.processing_status != "completed":
raise HTTPException(status_code=400, detail="Document not yet processed")
# Get summary status
summary_status = getattr(doc, 'summary_status', 'pending')
# If summary is completed, return structured summary
if summary_status == "completed" and hasattr(doc, 'contract_summary') and doc.contract_summary:
from ...models.contract_summary import ContractSummary
contract_summary = ContractSummary(**doc.contract_summary)
return ContractSummaryResponse(
success=True,
summary=contract_summary,
status=summary_status,
filename=doc.original_filename,
created_at=getattr(doc, 'summary_created_at', doc.created_at).isoformat() if getattr(doc, 'summary_created_at', doc.created_at) else None,
updated_at=doc.updated_at.isoformat() if doc.updated_at else None
)
# If summary is processing, return status
elif summary_status == "processing":
return ContractSummaryResponse(
success=False,
status=summary_status,
filename=doc.original_filename,
error="Contract summary is currently being processed. Please check back in a few moments."
)
# If summary failed, return error
elif summary_status == "failed":
error_msg = "Contract summary extraction failed."
if hasattr(doc, 'metadata') and doc.metadata and 'summary_error' in doc.metadata:
error_msg += f" Error: {doc.metadata['summary_error']}"
return ContractSummaryResponse(
success=False,
status=summary_status,
filename=doc.original_filename,
error=error_msg
)
# If summary is pending, return pending status
else:
return ContractSummaryResponse(
success=False,
status="pending",
filename=doc.original_filename,
error="Contract summary processing has not started yet. Please try again later."
)
@router.post("/{document_id}/summary/reprocess")
async def reprocess_document_summary(
document_id: str,
current_user: UserInDB = Depends(get_current_active_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Reprocess contract summary for a document"""
document = await db.documents.find_one({"_id": ObjectId(document_id)})
if not document:
raise HTTPException(status_code=404, detail="Document not found")
doc = DocumentInDB(**document)
# Check if user has access to this document's index
if current_user.role.value != "admin" and doc.index_id not in current_user.index_access:
raise HTTPException(status_code=403, detail="Access denied to this document")
# Check if document has been processed
if doc.processing_status != "completed" or not doc.parsed_text:
raise HTTPException(status_code=400, detail="Document not yet processed or text not available")
# Reset summary status and trigger reprocessing
from ...services.llama_processor import llama_processor
try:
# Reset summary status
await db.documents.update_one(
{"_id": ObjectId(document_id)},
{"$set": {
"summary_status": "pending",
"contract_summary": None,
"summary_created_at": None,
"updated_at": datetime.utcnow()
}}
)
# Trigger summary extraction asynchronously
asyncio.create_task(llama_processor._extract_contract_summary(
document_id, doc.parsed_text, doc.original_filename, db
))
return {
"message": "Contract summary reprocessing started",
"document_id": document_id,
"filename": doc.original_filename,
"status": "processing"
}
except Exception as e:
raise HTTPException(status_code=500, detail=f"Error starting summary reprocessing: {str(e)}")

View file

@ -0,0 +1,265 @@
from fastapi import APIRouter, Depends, HTTPException
from motor.motor_asyncio import AsyncIOMotorDatabase
from typing import List
from bson import ObjectId
from datetime import datetime
import uuid
from urllib.parse import unquote
from ...config.database import get_database
from ...core.auth import get_current_active_user, get_current_admin_user
from ...models.user import UserInDB
from ...models.index import Index, IndexCreate, IndexInDB
from ...services.rag_service import rag_service
from ...services.document_processor import document_processor
router = APIRouter()
@router.post("/create", response_model=dict)
async def create_index(
index_data: IndexCreate,
current_user: UserInDB = Depends(get_current_active_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Create a new document index"""
# Generate unique index ID
index_id = f"{index_data.name.lower().replace(' ', '-')}-{datetime.now().strftime('%Y-%m-%d')}-{str(uuid.uuid4())[:8]}"
# Create index record
index_dict = index_data.dict()
index_dict["index_id"] = index_id
index_dict["status"] = "active"
index_dict["document_count"] = 0
index_dict["settings"] = {}
index_dict["created_at"] = datetime.utcnow()
index_dict["updated_at"] = datetime.utcnow()
# Insert into database
result = await db.indices.insert_one(index_dict)
index_dict["_id"] = result.inserted_id
# Grant access to the creator
await db.users.update_one(
{"_id": current_user.id},
{"$addToSet": {"index_access": index_id}}
)
return {
"message": "Index created successfully",
"index_id": index_id,
"name": index_data.name,
"id": str(result.inserted_id)
}
@router.get("", response_model=List[dict])
async def get_user_indices(
current_user: UserInDB = Depends(get_current_active_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Get all indices accessible to the current user"""
# Ensure index_access is a list and not None
user_index_access = current_user.index_access if current_user.index_access else []
if current_user.role.value == "admin":
# Admin can see all indices
cursor = db.indices.find({"status": "active"})
else:
# Regular users see only their accessible indices
# If user has no index access, they should see no indices
if not user_index_access:
return []
cursor = db.indices.find({
"index_id": {"$in": user_index_access},
"status": "active"
})
indices = []
async for index in cursor:
index_obj = IndexInDB(**index)
# Double-check access control for non-admin users
if current_user.role.value != "admin":
if index_obj.index_id not in user_index_access:
continue # Skip this index if user doesn't have access
# Get real-time document count instead of stored value
real_document_count = await db.documents.count_documents({"index_id": index_obj.index_id})
indices.append({
"id": str(index_obj.id),
"index_id": index_obj.index_id,
"name": index_obj.name,
"description": index_obj.description,
"document_count": real_document_count,
"created_at": index_obj.created_at,
"updated_at": index_obj.updated_at,
"status": index_obj.status
})
return indices
@router.get("/{index_id}", response_model=dict)
async def get_index_details(
index_id: str,
current_user: UserInDB = Depends(get_current_active_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Get details of a specific index"""
# Decode URL-encoded index_id
decoded_index_id = unquote(index_id)
# Check if user has access to this index
if current_user.role.value != "admin" and decoded_index_id not in current_user.index_access:
raise HTTPException(status_code=403, detail="Access denied to this index")
index = await db.indices.find_one({"index_id": decoded_index_id})
if not index:
raise HTTPException(status_code=404, detail="Index not found")
index_obj = IndexInDB(**index)
# Get document count
document_count = await db.documents.count_documents({"index_id": decoded_index_id})
# Get documents
documents = await document_processor.get_documents_by_index(decoded_index_id, db)
return {
"id": str(index_obj.id),
"index_id": index_obj.index_id,
"name": index_obj.name,
"description": index_obj.description,
"document_count": document_count,
"created_at": index_obj.created_at,
"updated_at": index_obj.updated_at,
"status": index_obj.status,
"settings": index_obj.settings,
"documents": [
{
"id": str(doc.id),
"filename": doc.filename,
"original_filename": doc.original_filename,
"processing_status": doc.processing_status,
"created_at": doc.created_at
}
for doc in documents
]
}
@router.post("/{index_id}/rebuild")
async def rebuild_index(
index_id: str,
current_user: UserInDB = Depends(get_current_active_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Rebuild the vector index from all documents"""
# Decode URL-encoded index_id
decoded_index_id = unquote(index_id)
# Check if user has access to this index
if current_user.role.value != "admin" and decoded_index_id not in current_user.index_access:
raise HTTPException(status_code=403, detail="Access denied to this index")
# Get all documents for this index
documents = await document_processor.get_documents_by_index(decoded_index_id, db)
if not documents:
raise HTTPException(status_code=400, detail="No documents found for this index")
# NOTE: Index rebuilding is now handled by reprocessing documents through LlamaProcessor
# Clear existing vectors and reprocess all documents in this index
import asyncio
from ...services.llama_processor import llama_processor
reprocessed_count = 0
for doc in documents:
try:
print(f"Rebuilding document {doc.id}: {doc.original_filename}")
# Step 1: Clear existing vectors from ChromaDB
if doc.vector_ids:
print(f" - Clearing {len(doc.vector_ids)} existing vectors")
await llama_processor.delete_document_embeddings(
str(doc.id),
decoded_index_id
)
# Step 2: Clear document metadata and reset status
await db.documents.update_one(
{"_id": doc.id},
{
"$set": {
"processing_status": "pending",
"embedding_status": "pending",
"summary_status": "pending",
"updated_at": datetime.utcnow()
},
"$unset": {
"parsed_text": "",
"text_chunks": "",
"chunk_count": "",
"vector_ids": "",
"contract_summary": "",
"summary_created_at": ""
}
}
)
# Step 3: Start reprocessing
asyncio.create_task(llama_processor._process_document_async(doc, db))
reprocessed_count += 1
except Exception as e:
print(f"Error queueing document {doc.id} for reprocessing: {e}")
# Update index timestamp
await db.indices.update_one(
{"index_id": decoded_index_id},
{"$set": {"updated_at": datetime.utcnow()}}
)
return {
"message": f"Index rebuild started - {reprocessed_count} documents queued for reprocessing",
"document_count": reprocessed_count
}
@router.delete("/{index_id}")
async def delete_index(
index_id: str,
current_user: UserInDB = Depends(get_current_admin_user),
db: AsyncIOMotorDatabase = Depends(get_database)
):
"""Delete an index (admin only)"""
# Decode URL-encoded index_id
decoded_index_id = unquote(index_id)
index = await db.indices.find_one({"index_id": decoded_index_id})
if not index:
raise HTTPException(status_code=404, detail="Index not found")
# Delete vector index with complete cleanup
deletion_result = await rag_service.delete_index_complete(decoded_index_id)
if not deletion_result["success"]:
print(f"Warning during index deletion: {deletion_result['message']}")
# Delete all documents in this index
documents = await document_processor.get_documents_by_index(decoded_index_id, db)
for doc in documents:
await document_processor.delete_document(str(doc.id), db)
# Note: Cache clearing removed - caching is disabled for data freshness
# Mark index as deleted
await db.indices.update_one(
{"index_id": decoded_index_id},
{"$set": {"status": "deleted", "updated_at": datetime.utcnow()}}
)
# Remove index access from all users
await db.users.update_many(
{"index_access": decoded_index_id},
{"$pull": {"index_access": decoded_index_id}}
)
return {"message": "Index deleted successfully"}

View file

@ -0,0 +1,12 @@
from .settings import settings
from .database import get_database, get_redis, connect_to_mongo, close_mongo_connection, connect_to_redis, close_redis_connection
__all__ = [
"settings",
"get_database",
"get_redis",
"connect_to_mongo",
"close_mongo_connection",
"connect_to_redis",
"close_redis_connection"
]

View file

@ -0,0 +1,56 @@
from motor.motor_asyncio import AsyncIOMotorClient
from typing import Optional
import redis.asyncio as redis
from .settings import settings
class Database:
client: Optional[AsyncIOMotorClient] = None
database = None
redis_client: Optional[redis.Redis] = None
db = Database()
async def connect_to_mongo():
"""Create database connection"""
db.client = AsyncIOMotorClient(settings.mongodb_url)
db.database = db.client[settings.database_name]
# Test connection
try:
await db.client.admin.command('ping')
print("Connected to MongoDB successfully!")
except Exception as e:
print(f"Error connecting to MongoDB: {e}")
raise
async def close_mongo_connection():
"""Close database connection"""
if db.client:
db.client.close()
print("Disconnected from MongoDB")
async def connect_to_redis():
"""Create Redis connection"""
if settings.cache_enabled:
try:
db.redis_client = redis.from_url(settings.redis_url)
await db.redis_client.ping()
print("Connected to Redis successfully!")
except Exception as e:
print(f"Error connecting to Redis: {e}")
print("Continuing without Redis cache...")
db.redis_client = None
async def close_redis_connection():
"""Close Redis connection"""
if db.redis_client:
await db.redis_client.close()
print("Disconnected from Redis")
def get_database():
"""Get database instance"""
return db.database
def get_redis():
"""Get Redis instance"""
return db.redis_client

View file

@ -0,0 +1,53 @@
from pydantic_settings import BaseSettings
from typing import List, Optional
import os
class Settings(BaseSettings):
# Database
mongodb_url: str = "mongodb://localhost:27017"
database_name: str = "contract_analysis"
# Redis
redis_url: str = "redis://localhost:6379"
# Authentication
jwt_secret_key: str = "your-super-secret-jwt-key-change-this-in-production"
jwt_algorithm: str = "HS256"
jwt_expire_minutes: int = 180
# Azure AD / SSO Configuration
azure_client_id: Optional[str] = None
azure_tenant_id: Optional[str] = None
azure_redirect_uri: Optional[str] = None
azure_authority: Optional[str] = None
sso_enabled: bool = False
allow_local_admin: bool = True
# OpenAI
openai_api_key: str
llamaparse_api_key: str # Required for document processing
# Application
debug: bool = True
cors_origins: List[str] = ["http://localhost:3000", "http://localhost:3002", "https://ai-sandbox.oliver.solutions", "*"]
upload_dir: str = "./uploads"
indices_dir: str = "./indices"
# Document processing limits
max_document_chars: int = 1000000 # 1 million characters for contract summaries
max_summary_chars: int = 100000 # 100k characters for simple summaries
# Cache - DISABLED for data freshness and debugging
cache_enabled: bool = False
cache_ttl: int = 3600
class Config:
env_file = ".env"
case_sensitive = False
# Create settings instance
settings = Settings()
# Ensure directories exist
os.makedirs(settings.upload_dir, exist_ok=True)
os.makedirs(settings.indices_dir, exist_ok=True)

View file

@ -0,0 +1,15 @@
from .auth import get_current_user, get_current_active_user, get_current_admin_user, has_index_access, require_index_access
from .security import verify_password, get_password_hash, create_access_token, verify_token
# Cache import removed - caching disabled for data freshness
__all__ = [
"get_current_user",
"get_current_active_user",
"get_current_admin_user",
"has_index_access",
"require_index_access",
"verify_password",
"get_password_hash",
"create_access_token",
"verify_token"
]

81
backend/app/core/auth.py Normal file
View file

@ -0,0 +1,81 @@
from fastapi import Depends, HTTPException, status
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from typing import Optional
from motor.motor_asyncio import AsyncIOMotorDatabase
from bson import ObjectId
from .security import verify_token
from ..config.database import get_database
from ..models.user import UserInDB, UserRole
security = HTTPBearer()
async def get_current_user(
credentials: HTTPAuthorizationCredentials = Depends(security),
db: AsyncIOMotorDatabase = Depends(get_database)
) -> UserInDB:
"""Get current authenticated user"""
token = credentials.credentials
payload = verify_token(token)
user_id = payload.get("sub")
if user_id is None:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Could not validate credentials",
headers={"WWW-Authenticate": "Bearer"},
)
user = await db.users.find_one({"_id": ObjectId(user_id)})
if user is None:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="User not found",
headers={"WWW-Authenticate": "Bearer"},
)
return UserInDB(**user)
async def get_current_active_user(
current_user: UserInDB = Depends(get_current_user)
) -> UserInDB:
"""Get current active user"""
if not current_user.is_active:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Inactive user"
)
return current_user
async def get_current_admin_user(
current_user: UserInDB = Depends(get_current_active_user)
) -> UserInDB:
"""Get current admin user"""
if current_user.role != UserRole.ADMIN:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="Not enough permissions"
)
return current_user
async def has_index_access(
index_id: str,
current_user: UserInDB = Depends(get_current_active_user)
) -> bool:
"""Check if user has access to specific index"""
# Admin users have access to all indices
if current_user.role == UserRole.ADMIN:
return True
# Check if user has explicit access to this index
return index_id in current_user.index_access
def require_index_access(index_id: str):
"""Dependency to require index access"""
async def check_access(current_user: UserInDB = Depends(get_current_active_user)):
if not await has_index_access(index_id, current_user):
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="Access denied to this index"
)
return current_user
return check_access

160
backend/app/core/cache.py Normal file
View file

@ -0,0 +1,160 @@
import json
import hashlib
from typing import Optional, Any
from ..config.database import get_redis
from ..config.settings import settings
class CacheService:
def __init__(self):
self.redis = None
self.enabled = settings.cache_enabled
async def get_redis(self):
"""Get Redis client"""
if not self.redis:
self.redis = get_redis()
return self.redis
def _generate_key(self, prefix: str, *args) -> str:
"""Generate cache key"""
key_data = f"{prefix}:{':'.join(str(arg) for arg in args)}"
return hashlib.md5(key_data.encode()).hexdigest()
async def get(self, key: str) -> Optional[Any]:
"""Get value from cache"""
if not self.enabled:
return None
redis_client = await self.get_redis()
if not redis_client:
return None
try:
cached_data = await redis_client.get(key)
if cached_data:
return json.loads(cached_data)
except Exception as e:
print(f"Cache get error: {e}")
return None
async def set(self, key: str, value: Any, ttl: Optional[int] = None) -> bool:
"""Set value in cache"""
if not self.enabled:
return False
redis_client = await self.get_redis()
if not redis_client:
return False
try:
serialized_value = json.dumps(value, default=str)
ttl = ttl or settings.cache_ttl
await redis_client.setex(key, ttl, serialized_value)
return True
except Exception as e:
print(f"Cache set error: {e}")
return False
async def delete(self, key: str) -> bool:
"""Delete value from cache"""
if not self.enabled:
return False
redis_client = await self.get_redis()
if not redis_client:
return False
try:
await redis_client.delete(key)
return True
except Exception as e:
print(f"Cache delete error: {e}")
return False
async def clear_pattern(self, pattern: str) -> bool:
"""Clear cache entries matching pattern"""
if not self.enabled:
return False
redis_client = await self.get_redis()
if not redis_client:
return False
try:
keys = await redis_client.keys(pattern)
if keys:
await redis_client.delete(*keys)
return True
except Exception as e:
print(f"Cache clear pattern error: {e}")
return False
def get_chat_cache_key(self, query: str, index_id: str) -> str:
"""Generate cache key for chat responses"""
return self._generate_key("chat", query, index_id)
def get_document_cache_key(self, index_id: str) -> str:
"""Generate cache key for document lists"""
return self._generate_key("documents", index_id)
def get_index_cache_key(self, user_id: str) -> str:
"""Generate cache key for user indices"""
return self._generate_key("indices", user_id)
def get_all_cache_keys_for_index(self, index_id: str) -> list[str]:
"""Get all cache keys that should be cleared for a specific index"""
return [
self.get_document_cache_key(index_id),
# Chat cache keys are query-specific, so we'll use pattern matching for those
]
async def clear_index_cache(self, index_id: str) -> bool:
"""Clear only cache entries for specific index"""
if not self.enabled:
return False
try:
# Clear document cache for this index
document_key = self.get_document_cache_key(index_id)
await self.delete(document_key)
# Clear chat cache entries for this index using targeted pattern
# Pattern: chat:*:index_id (since chat keys are "chat:query:index_id")
chat_pattern = f"*:{index_id}"
redis_client = await self.get_redis()
if redis_client:
# Get all keys and filter for chat keys with this index_id
all_keys = await redis_client.keys("*")
chat_keys_to_delete = []
for key in all_keys:
key_str = key.decode() if isinstance(key, bytes) else str(key)
# Check if this is a chat cache key for our index
if key_str.endswith(f":{index_id}") and "chat" in key_str:
chat_keys_to_delete.append(key)
if chat_keys_to_delete:
await redis_client.delete(*chat_keys_to_delete)
return True
except Exception as e:
print(f"Error clearing cache for index {index_id}: {e}")
return False
async def clear_user_index_cache(self, user_id: str) -> bool:
"""Clear index cache for a specific user"""
if not self.enabled:
return False
try:
user_index_key = self.get_index_cache_key(user_id)
await self.delete(user_index_key)
return True
except Exception as e:
print(f"Error clearing user index cache for {user_id}: {e}")
return False
# Global cache instance
cache = CacheService()

View file

@ -0,0 +1,54 @@
"""
Shared ChromaDB client singleton to prevent initialization conflicts
"""
import chromadb
from chromadb.config import Settings as ChromaSettings
from pathlib import Path
from typing import Optional
class ChromaDBSingleton:
"""Singleton ChromaDB client to prevent 'different settings' errors"""
_instance: Optional['ChromaDBSingleton'] = None
_client: Optional[chromadb.PersistentClient] = None
def __new__(cls):
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
def get_client(self, chroma_db_path: str) -> chromadb.PersistentClient:
"""Get or create ChromaDB client with consistent settings"""
if self._client is None:
try:
# Use consistent settings across all services
settings = ChromaSettings(anonymized_telemetry=False)
self._client = chromadb.PersistentClient(
path=chroma_db_path,
settings=settings
)
print(f"Created shared ChromaDB client at {chroma_db_path}")
except Exception as e:
print(f"Error creating shared ChromaDB client: {e}")
# Try with minimal settings if the above fails
try:
self._client = chromadb.PersistentClient(path=chroma_db_path)
print(f"Created ChromaDB client with default settings at {chroma_db_path}")
except Exception as e2:
print(f"Failed to create ChromaDB client: {e2}")
raise e2
return self._client
@classmethod
def reset(cls):
"""Reset the singleton (useful for testing or reinitialization)"""
if cls._instance and cls._instance._client:
# Close existing client if possible
try:
cls._instance._client.reset()
except:
pass
cls._instance = None
cls._client = None
# Global singleton instance
chroma_singleton = ChromaDBSingleton()

View file

@ -0,0 +1,41 @@
from datetime import datetime, timedelta
from typing import Optional, Union
from jose import JWTError, jwt
from passlib.context import CryptContext
from fastapi import HTTPException, status
from ..config.settings import settings
# Password hashing
pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
def verify_password(plain_password: str, hashed_password: str) -> bool:
"""Verify a password against its hash"""
return pwd_context.verify(plain_password, hashed_password)
def get_password_hash(password: str) -> str:
"""Hash a password"""
return pwd_context.hash(password)
def create_access_token(data: dict, expires_delta: Optional[timedelta] = None) -> str:
"""Create a JWT access token"""
to_encode = data.copy()
if expires_delta:
expire = datetime.utcnow() + expires_delta
else:
expire = datetime.utcnow() + timedelta(minutes=settings.jwt_expire_minutes)
to_encode.update({"exp": expire})
encoded_jwt = jwt.encode(to_encode, settings.jwt_secret_key, algorithm=settings.jwt_algorithm)
return encoded_jwt
def verify_token(token: str) -> dict:
"""Verify and decode a JWT token"""
try:
payload = jwt.decode(token, settings.jwt_secret_key, algorithms=[settings.jwt_algorithm])
return payload
except JWTError:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Could not validate credentials",
headers={"WWW-Authenticate": "Bearer"},
)

86
backend/app/main.py Normal file
View file

@ -0,0 +1,86 @@
from fastapi import FastAPI, Request #type: ignore
from fastapi.middleware.cors import CORSMiddleware #type: ignore
from fastapi.responses import JSONResponse #type: ignore
import time
import uvicorn #type: ignore
from .config import settings, connect_to_mongo, close_mongo_connection, connect_to_redis, close_redis_connection
from .api.v1 import auth, documents, indices, chat, admin
# Create FastAPI app
app = FastAPI(
title="Contract Analysis API",
description="FastAPI backend for intelligent contract analysis and document Q&A",
version="2.0.0",
docs_url="/docs" if settings.debug else None,
redoc_url="/redoc" if settings.debug else None,
)
# Add CORS middleware
app.add_middleware(
CORSMiddleware,
allow_origins=settings.cors_origins,
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# Add request timing middleware
@app.middleware("http")
async def add_process_time_header(request: Request, call_next):
start_time = time.time()
response = await call_next(request)
process_time = time.time() - start_time
response.headers["X-Process-Time"] = str(process_time)
return response
# Exception handlers
@app.exception_handler(Exception)
async def global_exception_handler(request: Request, exc: Exception):
if settings.debug:
return JSONResponse(
status_code=500,
content={"detail": str(exc), "type": type(exc).__name__}
)
return JSONResponse(
status_code=500,
content={"detail": "Internal server error"}
)
# Startup and shutdown events
@app.on_event("startup")
async def startup_event():
await connect_to_mongo()
await connect_to_redis()
print("✅ Application startup complete")
@app.on_event("shutdown")
async def shutdown_event():
await close_mongo_connection()
await close_redis_connection()
print("✅ Application shutdown complete")
# Include routers
app.include_router(auth.router, prefix="/api/v1/auth", tags=["authentication"])
app.include_router(documents.router, prefix="/api/v1/documents", tags=["documents"])
app.include_router(indices.router, prefix="/api/v1/indices", tags=["indices"])
app.include_router(chat.router, prefix="/api/v1/chat", tags=["chat"])
app.include_router(admin.router, prefix="/api/v1/admin", tags=["admin"])
# Health check endpoint
@app.get("/health")
async def health_check():
return {"status": "healthy", "version": "2.0.0"}
# Root endpoint
@app.get("/")
async def root():
return {"message": "Contract Analysis API", "version": "2.0.0"}
if __name__ == "__main__":
uvicorn.run(
"app.main:app",
host="0.0.0.0",
port=8000,
reload=settings.debug
)

View file

@ -0,0 +1,25 @@
from .user import User, UserCreate, UserUpdate, UserInDB, UserRole
from .document import Document, DocumentCreate, DocumentUpdate, DocumentInDB
from .index import Index, IndexCreate, IndexUpdate, IndexInDB
from .chat import ChatMessage, ChatMessageCreate, ChatMessageInDB, ChatQuery, ChatResponse
__all__ = [
"User",
"UserCreate",
"UserUpdate",
"UserInDB",
"UserRole",
"Document",
"DocumentCreate",
"DocumentUpdate",
"DocumentInDB",
"Index",
"IndexCreate",
"IndexUpdate",
"IndexInDB",
"ChatMessage",
"ChatMessageCreate",
"ChatMessageInDB",
"ChatQuery",
"ChatResponse"
]

View file

@ -0,0 +1,53 @@
from pydantic import BaseModel, Field
from typing import Optional, List, Dict, Any
from datetime import datetime
from bson import ObjectId
from .user import PyObjectId
class ChatMessageBase(BaseModel):
user_id: PyObjectId
index_id: str
query: str
response: str
created_at: Optional[datetime] = None
updated_at: Optional[datetime] = None
deleted_by_user: bool = False
class ChatMessageCreate(ChatMessageBase):
pass
class ChatMessageInDB(ChatMessageBase):
id: PyObjectId = Field(default_factory=PyObjectId, alias="_id")
debug_info: Dict[str, Any] = Field(default_factory=dict)
response_time: float = 0.0
cached: bool = False
sources: List[Dict[str, Any]] = Field(default_factory=list)
context_used: Optional[str] = None
class Config:
populate_by_name = True
arbitrary_types_allowed = True
json_encoders = {ObjectId: str}
class ChatMessage(ChatMessageBase):
id: PyObjectId = Field(default_factory=PyObjectId, alias="_id")
debug_info: Dict[str, Any] = Field(default_factory=dict)
response_time: float = 0.0
cached: bool = False
sources: List[Dict[str, Any]] = Field(default_factory=list)
context_used: Optional[str] = None
class Config:
populate_by_name = True
arbitrary_types_allowed = True
json_encoders = {ObjectId: str}
class ChatQuery(BaseModel):
query: str
index_id: str
class ChatResponse(BaseModel):
response: str
debug_info: Dict[str, Any] = Field(default_factory=dict)
cached: bool = False
response_time: float = 0.0

View file

@ -0,0 +1,110 @@
from pydantic import BaseModel, Field
from typing import Optional, Dict, Any
class ScopeOfWork(BaseModel):
summary_tasks_deliverables: Optional[str] = None
key_dates: Optional[str] = None
key_kpis: Optional[str] = None
class TermsAndTermination(BaseModel):
duration: Optional[str] = None
termination_conditions: Optional[str] = None
penalties: Optional[str] = None
class PaymentTerms(BaseModel):
payment_method: Optional[str] = None
payment_schedule: Optional[str] = None
pricing_details: Optional[str] = None
mark_ups: Optional[str] = None
payment_schedules: Optional[str] = None
late_payment_penalties: Optional[str] = None
discounts: Optional[str] = None
class LiabilityIndemnification(BaseModel):
responsibilities_liabilities: Optional[str] = None
damages_losses: Optional[str] = None
indemnification_clauses: Optional[str] = None
class Confidentiality(BaseModel):
scope: Optional[str] = None
duration: Optional[str] = None
exceptions: Optional[str] = None
disclosures_by_law: Optional[str] = None
breach_consequences: Optional[str] = None
class IntellectualProperty(BaseModel):
licensor: Optional[str] = None
licensee: Optional[str] = None
terms_renewal: Optional[str] = None
pricing: Optional[str] = None
definitions: Optional[str] = None
scope: Optional[str] = None
duration: Optional[str] = None
territory: Optional[str] = None
use_ownership_rights: Optional[str] = None
class DisputeResolution(BaseModel):
methods: Optional[str] = None
mediation_options: Optional[str] = None
arbitration_options: Optional[str] = None
litigation_options: Optional[str] = None
class WarrantiesRepresentations(BaseModel):
service_standards: Optional[str] = None
service_assurances: Optional[str] = None
class ComplianceWithLaws(BaseModel):
relevant_laws: Optional[str] = None
owner_obligations: Optional[str] = None
class AmendmentsVersions(BaseModel):
change_management: Optional[str] = None
written_consent: Optional[str] = None
class AssignmentSubcontracting(BaseModel):
delegation_assignment: Optional[str] = None
class ContractSummary(BaseModel):
"""Complete contract summary schema matching the reference implementation"""
# Basic contract information
contract_type: Optional[str] = None
overview_purpose: Optional[str] = None
relevant_account: Optional[str] = None
in_studio_name: Optional[str] = None
client_sender_name: Optional[str] = None
client_sender_address: Optional[str] = None
agency_name: Optional[str] = None
agency_address: Optional[str] = None
dates_signed: Optional[str] = None
terms: Optional[str] = None
date_expired: Optional[str] = None
pricing_payment_terms: Optional[str] = None
# Complex nested sections
scope_of_work: Optional[ScopeOfWork] = None
terms_and_termination: Optional[TermsAndTermination] = None
payment_terms: Optional[PaymentTerms] = None
liability_indemnification: Optional[LiabilityIndemnification] = None
confidentiality: Optional[Confidentiality] = None
intellectual_property: Optional[IntellectualProperty] = None
dispute_resolution: Optional[DisputeResolution] = None
warranties_representations: Optional[WarrantiesRepresentations] = None
compliance_with_laws: Optional[ComplianceWithLaws] = None
amendments_versions: Optional[AmendmentsVersions] = None
assignment_subcontracting: Optional[AssignmentSubcontracting] = None
class Config:
json_encoders = {
type(None): lambda v: "N/A (Not found in Doc)"
}
class ContractSummaryResponse(BaseModel):
"""Response model for contract summary API"""
success: bool
summary: Optional[ContractSummary] = None
status: str
filename: Optional[str] = None
created_at: Optional[str] = None
updated_at: Optional[str] = None
error: Optional[str] = None

View file

@ -0,0 +1,59 @@
from pydantic import BaseModel, Field
from typing import Optional, List, Dict, Any
from datetime import datetime
from bson import ObjectId
from .user import PyObjectId
class DocumentBase(BaseModel):
filename: str
original_filename: str
file_size: int
content_type: str
index_id: str
uploaded_by: PyObjectId
created_at: Optional[datetime] = None
updated_at: Optional[datetime] = None
class DocumentCreate(DocumentBase):
pass
class DocumentUpdate(BaseModel):
filename: Optional[str] = None
updated_at: Optional[datetime] = None
class DocumentInDB(DocumentBase):
id: PyObjectId = Field(default_factory=PyObjectId, alias="_id")
file_path: str
processing_status: str = "pending" # pending, processing, completed, failed
metadata: Dict[str, Any] = Field(default_factory=dict)
parsed_text: Optional[str] = None
text_chunks: Optional[List[str]] = None
embedding_status: str = "pending" # pending, processing, completed, failed
chunk_count: int = 0
vector_ids: Optional[List[str]] = None
contract_summary: Optional[Dict[str, Any]] = None
summary_status: str = "pending" # pending, processing, completed, failed
summary_created_at: Optional[datetime] = None
class Config:
populate_by_name = True
arbitrary_types_allowed = True
json_encoders = {ObjectId: str}
class Document(DocumentBase):
id: PyObjectId = Field(default_factory=PyObjectId, alias="_id")
processing_status: str = "pending"
metadata: Dict[str, Any] = Field(default_factory=dict)
parsed_text: Optional[str] = None
text_chunks: Optional[List[str]] = None
embedding_status: str = "pending"
chunk_count: int = 0
vector_ids: Optional[List[str]] = None
contract_summary: Optional[Dict[str, Any]] = None
summary_status: str = "pending"
summary_created_at: Optional[datetime] = None
class Config:
populate_by_name = True
arbitrary_types_allowed = True
json_encoders = {ObjectId: str}

View file

@ -0,0 +1,52 @@
from pydantic import BaseModel, Field
from typing import Optional, List, Dict, Any
from datetime import datetime
from bson import ObjectId
from .user import PyObjectId
class IndexBase(BaseModel):
name: str
description: Optional[str] = None
created_by: PyObjectId
created_at: Optional[datetime] = None
updated_at: Optional[datetime] = None
class IndexCreate(IndexBase):
pass
class IndexUpdate(BaseModel):
name: Optional[str] = None
description: Optional[str] = None
updated_at: Optional[datetime] = None
class IndexInDB(IndexBase):
id: PyObjectId = Field(default_factory=PyObjectId, alias="_id")
index_id: str # Unique string identifier for the index
status: str = "active" # active, inactive, deleted
document_count: int = 0
settings: Dict[str, Any] = Field(default_factory=dict)
vector_store_path: Optional[str] = None
embedding_model: str = "text-embedding-ada-002"
chunk_size: int = 1000
chunk_overlap: int = 200
class Config:
populate_by_name = True
arbitrary_types_allowed = True
json_encoders = {ObjectId: str}
class Index(IndexBase):
id: PyObjectId = Field(default_factory=PyObjectId, alias="_id")
index_id: str
status: str = "active"
document_count: int = 0
settings: Dict[str, Any] = Field(default_factory=dict)
vector_store_path: Optional[str] = None
embedding_model: str = "text-embedding-ada-002"
chunk_size: int = 1000
chunk_overlap: int = 200
class Config:
populate_by_name = True
arbitrary_types_allowed = True
json_encoders = {ObjectId: str}

105
backend/app/models/user.py Normal file
View file

@ -0,0 +1,105 @@
from pydantic import BaseModel, Field, EmailStr
from typing import Optional, List, Dict
from datetime import datetime
from bson import ObjectId
from enum import Enum
class UserRole(str, Enum):
ADMIN = "admin"
USER = "user"
class AuthMethod(str, Enum):
LOCAL = "local"
SSO = "sso"
class PyObjectId(ObjectId):
@classmethod
def __get_pydantic_core_schema__(cls, _source_type, _handler):
from pydantic_core import core_schema
def validate_from_str(input_value: str) -> ObjectId:
return ObjectId(input_value)
def validate_from_objectid(input_value: ObjectId) -> ObjectId:
return input_value
return core_schema.union_schema([
core_schema.is_instance_schema(ObjectId),
core_schema.no_info_plain_validator_function(
validate_from_str,
serialization=core_schema.to_string_ser_schema(),
),
])
@classmethod
def __get_pydantic_json_schema__(cls, core_schema, handler):
json_schema = handler(core_schema)
json_schema.update(type="string")
return json_schema
class UserBase(BaseModel):
email: EmailStr
role: UserRole = UserRole.USER
is_active: bool = True
auth_method: AuthMethod = AuthMethod.LOCAL
sso_provider: Optional[str] = None
sso_user_id: Optional[str] = None
sso_email: Optional[str] = None
sso_name: Optional[str] = None
sso_attributes: Optional[Dict] = None
last_sso_login: Optional[datetime] = None
created_at: Optional[datetime] = None
updated_at: Optional[datetime] = None
class UserCreate(UserBase):
password: str
class UserUpdate(BaseModel):
email: Optional[EmailStr] = None
role: Optional[UserRole] = None
is_active: Optional[bool] = None
password: Optional[str] = None
auth_method: Optional[AuthMethod] = None
sso_provider: Optional[str] = None
sso_user_id: Optional[str] = None
sso_email: Optional[str] = None
sso_name: Optional[str] = None
sso_attributes: Optional[Dict] = None
last_sso_login: Optional[datetime] = None
class UserInDB(UserBase):
id: PyObjectId = Field(default_factory=PyObjectId, alias="_id")
hashed_password: Optional[str] = None # Optional for SSO users
index_access: List[str] = Field(default_factory=list) # List of index IDs user has access to
class Config:
populate_by_name = True
arbitrary_types_allowed = True
json_encoders = {ObjectId: str}
class User(UserBase):
id: PyObjectId = Field(default_factory=PyObjectId, alias="_id")
index_access: List[str] = Field(default_factory=list)
class Config:
populate_by_name = True
arbitrary_types_allowed = True
json_encoders = {ObjectId: str}
class UserResponse(BaseModel):
id: PyObjectId = Field(alias="_id")
email: EmailStr
role: UserRole
is_active: bool
auth_method: AuthMethod
sso_provider: Optional[str] = None
sso_name: Optional[str] = None
last_sso_login: Optional[datetime] = None
index_access: List[str]
created_at: Optional[datetime] = None
updated_at: Optional[datetime] = None
class Config:
populate_by_name = True
arbitrary_types_allowed = True
json_encoders = {ObjectId: str}

View file

@ -0,0 +1,7 @@
from .document_processor import document_processor
from .rag_service import rag_service
__all__ = [
"document_processor",
"rag_service"
]

View file

@ -0,0 +1,192 @@
import asyncio
from typing import List, Dict, Any, Optional
from datetime import datetime, timedelta
from motor.motor_asyncio import AsyncIOMotorDatabase
from bson import ObjectId
# Cache import removed - caching disabled for data freshness
from ..config.settings import settings
from .llama_processor import llama_processor
class ChatContextService:
def __init__(self):
self.max_context_messages = 10 # Maximum number of previous messages to include
self.context_window_hours = 24 # Context window in hours
async def get_conversation_context(
self,
user_id: str,
index_id: str,
db: AsyncIOMotorDatabase,
limit: int = None
) -> List[Dict[str, Any]]:
"""Get recent conversation context for the user and index"""
try:
# Use provided limit or default
message_limit = limit or self.max_context_messages
# Get recent messages within the context window
cutoff_time = datetime.utcnow() - timedelta(hours=self.context_window_hours)
cursor = db.chat_messages.find({
"user_id": ObjectId(user_id),
"index_id": index_id,
"created_at": {"$gte": cutoff_time},
"deleted_by_user": {"$ne": True}
}).sort("created_at", -1).limit(message_limit)
messages = []
async for msg in cursor:
messages.append({
"query": msg["query"],
"response": msg["response"],
"created_at": msg["created_at"]
})
# Return in chronological order (oldest first)
return list(reversed(messages))
except Exception as e:
print(f"Error getting conversation context: {e}")
return []
def format_context_for_ai(self, context_messages: List[Dict[str, Any]]) -> str:
"""Format conversation context for AI prompt"""
if not context_messages:
return ""
formatted_context = []
for msg in context_messages:
formatted_context.append(f"User: {msg['query']}")
formatted_context.append(f"Assistant: {msg['response']}")
return "\n".join(formatted_context)
async def generate_contextual_response(
self,
query: str,
index_id: str,
user_id: str,
db: AsyncIOMotorDatabase,
context_chunks: List[str]
) -> Dict[str, Any]:
"""Generate response with conversation context"""
try:
# Get conversation context
context_messages = await self.get_conversation_context(
user_id, index_id, db
)
# Format conversation context
conversation_context = self.format_context_for_ai(context_messages)
# Prepare document context
document_context = "\n\n".join(context_chunks)
# Create enhanced prompt with conversation context
prompt = self._create_contextual_prompt(
query, document_context, conversation_context
)
# Generate response using OpenAI
from llama_index.llms.openai import OpenAI
llm = OpenAI(
model="gpt-4o",
api_key=settings.openai_api_key,
temperature=0.1
)
# Use sync completion for now as acomplete has issues
response = llm.complete(prompt)
return {
"response": response.text,
"context_used": conversation_context,
"context_messages_count": len(context_messages)
}
except Exception as e:
print(f"Error generating contextual response: {e}")
# Fallback to basic response without context
try:
from llama_index.llms.openai import OpenAI
llm = OpenAI(
model="gpt-4o",
api_key=settings.openai_api_key,
temperature=0.1
)
# Simple prompt without context
context_text = "\n\n".join(context_chunks)
simple_prompt = f"""Based on the following context, answer the user's question. If the answer is not in the context, say "I don't have enough information to answer that question."
Return results as pure markdown - no code block.
Context:
{context_text}
Question: {query}
Answer:"""
response = llm.complete(simple_prompt)
return {
"response": response.text,
"context_used": None,
"context_messages_count": 0
}
except Exception as fallback_error:
print(f"Fallback response generation failed: {fallback_error}")
return {
"response": "I'm sorry, I encountered an error while processing your question. Please try again.",
"context_used": None,
"context_messages_count": 0
}
def _create_contextual_prompt(
self,
query: str,
document_context: str,
conversation_context: str
) -> str:
"""Create a contextual prompt for the AI"""
prompt_parts = []
prompt_parts.append(
"You are an AI assistant helping users understand their documents. "
"Answer questions based on the provided document context and consider "
"the conversation history for continuity."
)
if conversation_context:
prompt_parts.append(f"""
Previous conversation:
{conversation_context}
""")
prompt_parts.append(f"""
Document context:
{document_context}
Current question: {query}
Instructions:
1. Answer based primarily on the document context provided
2. Consider the conversation history for continuity and context
3. If the answer is not in the documents, clearly state this
4. Be concise but comprehensive
5. Reference specific information from the documents when possible
6. If referring to previous parts of the conversation, be explicit about it
7. Return results as pure markdown - no code block
Answer:""")
return "\n".join(prompt_parts)
# cache_context_key method removed - caching disabled for data freshness
# Global chat context service instance
chat_context_service = ChatContextService()

View file

@ -0,0 +1,238 @@
import json
import asyncio
from typing import Dict, Any, Optional
from datetime import datetime
from openai import OpenAI
from ..config.settings import settings
from ..models.contract_summary import ContractSummary
class ContractSummaryService:
"""Service for extracting structured contract summaries using OpenAI GPT-4"""
def __init__(self):
self.client = OpenAI(api_key=settings.openai_api_key)
self.max_chars = settings.max_document_chars
async def extract_contract_summary(self, document_text: str, filename: str) -> Dict[str, Any]:
"""
Extract structured contract summary using OpenAI GPT-4
Args:
document_text (str): Full text of the document
filename (str): Name of the file being processed
Returns:
dict: Extraction result with success flag and summary data
"""
try:
print(f"Extracting contract summary from: {filename}")
# Check document length and raise error if too long
if len(document_text) > self.max_chars:
error_msg = f"Document too large: {len(document_text)} characters exceeds maximum of {self.max_chars} characters"
print(f"Error: {error_msg}")
return {
"success": False,
"error": error_msg,
"filename": filename
}
# Get the contract schema prompt
contract_schema = self._get_contract_schema()
# Create the prompt
prompt = f"""
Document filename: {filename}
Document content:
{document_text}
{contract_schema}
"""
# Call OpenAI API
response = await asyncio.to_thread(
self.client.chat.completions.create,
model="gpt-4o",
messages=[
{
"role": "system",
"content": "You are a contract analysis expert. Extract contract information accurately and return only valid JSON."
},
{
"role": "user",
"content": prompt
}
],
max_tokens=4000,
temperature=0.1
)
# Extract the response
content = response.choices[0].message.content.strip()
# Try to parse the JSON
try:
summary_json = json.loads(content)
print(f"Successfully extracted summary for {filename}")
return {
"success": True,
"summary": summary_json,
"filename": filename
}
except json.JSONDecodeError as e:
print(f"JSON parsing error: {e}")
print(f"Raw response length: {len(content)} characters")
# Try to extract JSON from the response if it's wrapped in text
import re
json_match = re.search(r'\{.*\}', content, re.DOTALL)
if json_match:
try:
summary_json = json.loads(json_match.group())
print(f"Successfully extracted JSON from wrapped response for {filename}")
return {
"success": True,
"summary": summary_json,
"filename": filename
}
except json.JSONDecodeError:
pass
return {
"success": False,
"error": f"Failed to parse JSON response: {e}. Response length: {len(content)} characters",
"filename": filename
}
except Exception as e:
print(f"Error calling OpenAI API: {e}")
return {
"success": False,
"error": f"OpenAI API error: {str(e)}",
"filename": filename
}
def validate_contract_summary(self, summary_json: Dict[str, Any]) -> ContractSummary:
"""
Validate and convert raw JSON to structured ContractSummary model
Args:
summary_json (dict): Raw JSON from OpenAI
Returns:
ContractSummary: Validated summary object
"""
try:
# Convert any None values to "N/A (Not found in Doc)" for consistency
def convert_none_values(obj):
if isinstance(obj, dict):
return {k: convert_none_values(v) for k, v in obj.items()}
elif obj is None or obj == "":
return "N/A (Not found in Doc)"
return obj
cleaned_summary = convert_none_values(summary_json)
# Validate using Pydantic model
contract_summary = ContractSummary(**cleaned_summary)
return contract_summary
except Exception as e:
print(f"Error validating contract summary: {e}")
# Return empty summary with error indication
return ContractSummary(
contract_type="Error in processing",
overview_purpose=f"Error validating summary: {str(e)}"
)
def _get_contract_schema(self) -> str:
"""Get the contract analysis schema prompt"""
return """
Please extract the following information from this contract document and return it in JSON format.
If any information is not found in the document, use "N/A (Not found in Doc)" as the value.
Required fields:
{
"contract_type": "Type of contract (MSA, SOW, Supplier Contract, Vendor Contract, Licensing Agreement, NDA, etc.)",
"overview_purpose": "Brief overview and purpose of the contract",
"relevant_account": "Client account name or relevant account",
"in_studio_name": "In-Studio Name (e.g., The Mix)",
"client_sender_name": "Client/Sender Name",
"client_sender_address": "Client/Sender Address",
"agency_name": "Agency Name (OLM, IIG, AYS, BTG, etc.)",
"agency_address": "Agency Address",
"dates_signed": "Date(s) when contract was signed",
"terms": "Contract terms/duration",
"date_expired": "Contract expiration date",
"pricing_payment_terms": "Pricing and payment terms overview",
"scope_of_work": {
"summary_tasks_deliverables": "Summary of tasks and deliverables",
"key_dates": "Key dates and milestones",
"key_kpis": "Key KPIs or performance indicators"
},
"terms_and_termination": {
"duration": "Look for contract duration, term length, effective period, validity period, or how long this agreement remains in force. Search for phrases like 'term of', 'duration', 'effective for', 'valid until', 'expires on', or specific time periods",
"termination_conditions": "Find termination clauses, conditions under which either party can end the agreement, notice periods required for termination, breach conditions, or circumstances that allow contract cancellation. Look for sections titled 'Termination', 'End of Agreement', or phrases like 'may be terminated', 'notice of termination'",
"penalties": "Search for financial penalties, liquidated damages, fees, or costs associated with early termination, breach of contract, or cancellation. Look for monetary amounts, penalty clauses, or consequences of termination"
},
"payment_terms": {
"payment_method": "Search for how payments are processed - check for bank transfer, wire transfer, check, ACH, credit card, electronic payment, or specific payment platforms. Look for banking details, payment processing instructions, or remittance information",
"payment_schedule": "Find when payments are due - look for payment frequency (monthly, quarterly, annually), due dates, billing cycles, invoice terms, or payment timing. Search for phrases like 'payable within', 'due on', 'payment schedule', or specific dates",
"pricing_details": "Look for detailed pricing structure including rates, fees, hourly rates, project costs, retainer amounts, or cost breakdowns. Search for currency amounts, pricing tables, rate cards, or cost schedules",
"mark_ups": "Find any markup percentages, additional fees, surcharges, or percentage-based charges applied to costs. Look for percentage symbols, markup clauses, or additional charges",
"payment_schedules": "Look for detailed payment timing including milestone payments, installment schedules, advance payments, or progress-based payment structures. Search for payment phases or staged payment plans",
"late_payment_penalties": "Search for late payment fees, interest charges, penalty rates, or consequences of delayed payment. Look for percentage rates for late fees, daily charges, or penalty clauses",
"discounts": "Find any available discounts, early payment incentives, volume discounts, or reduced rates for specific conditions. Look for percentage discounts or preferential pricing terms"
},
"liability_indemnification": {
"responsibilities_liabilities": "Search for sections defining each party's responsibilities, obligations, duties, or liabilities. Look for phrases like 'responsible for', 'liable for', 'obligations include', 'duties of', or specific responsibility assignments",
"damages_losses": "Find who bears responsibility for damages, losses, claims, or financial harm. Look for liability caps, exclusions, limitations of liability, or damage allocation clauses. Search for monetary limits or damage responsibility",
"indemnification_clauses": "Look for indemnification provisions, hold harmless clauses, or protection from third-party claims. Search for phrases like 'indemnify', 'hold harmless', 'defend against', or protection from lawsuits and claims"
},
"confidentiality": {
"scope": "Look for what information is considered confidential - proprietary data, trade secrets, business information, client data, technical information, or specific categories of protected information. Search for definitions of confidential information",
"duration": "Find how long confidentiality obligations last - look for time periods, survival clauses, or duration of non-disclosure obligations. Search for phrases like 'perpetual', 'for X years', 'survives termination', or confidentiality periods",
"exceptions": "Search for exceptions to confidentiality - publicly available information, independently developed information, or legally required disclosures. Look for carve-outs or situations where confidentiality doesn't apply",
"disclosures_by_law": "Find circumstances where confidential information may be disclosed due to legal requirements, court orders, regulatory demands, or government requests. Look for legal disclosure provisions",
"breach_consequences": "Search for penalties, damages, or consequences for violating confidentiality obligations. Look for monetary damages, injunctive relief, or specific penalties for breach of non-disclosure"
},
"intellectual_property": {
"licensor": "Find who is granting intellectual property rights - the party providing licenses, copyrights, trademarks, or other IP rights. Look for the entity or person licensing their intellectual property",
"licensee": "Identify who is receiving intellectual property rights - the party getting permission to use copyrights, trademarks, patents, or other IP. Look for the recipient of IP licensing",
"terms_renewal": "Search for intellectual property renewal terms, license extension conditions, or how IP rights can be renewed or continued. Look for renewal clauses, automatic extensions, or renewal procedures",
"pricing": "Find costs associated with intellectual property use - licensing fees, royalties, IP-related payments, or costs for using copyrighted or trademarked materials. Look for IP pricing structures",
"definitions": "Look for definitions of intellectual property terms, what constitutes IP in this agreement, or specific IP categories covered. Search for IP definitions and scope of protected materials",
"scope": "Find what intellectual property rights are included - copyrights, trademarks, patents, trade secrets, proprietary information, or specific IP assets covered by the agreement",
"duration": "Search for how long intellectual property rights last - license duration, IP protection periods, or time limits on IP usage. Look for IP term lengths or expiration dates",
"territory": "Find geographical limitations on IP rights - specific countries, regions, or territories where IP rights apply. Look for geographic restrictions or worldwide rights",
"use_ownership_rights": "Search for permitted uses of intellectual property, ownership transfers, usage restrictions, or what can be done with the licensed IP. Look for usage rights and ownership provisions"
},
"dispute_resolution": {
"methods": "Search for how disputes will be resolved - negotiation, mediation, arbitration, litigation, or alternative dispute resolution methods. Look for dispute resolution procedures or escalation processes",
"mediation_options": "Find if mediation is required or available for resolving disputes - look for mediation clauses, mediator selection, or mediation procedures. Search for mediation requirements or options",
"arbitration_options": "Look for arbitration clauses, arbitration requirements, arbitrator selection procedures, or binding arbitration provisions. Search for arbitration rules or arbitration organization references",
"litigation_options": "Find court jurisdiction, governing law, or litigation procedures if disputes go to court. Look for jurisdiction clauses, court selection, or legal venue specifications"
},
"warranties_representations": {
"service_standards": "Look for quality standards, performance expectations, service level agreements, or specific standards that services must meet. Search for performance metrics, quality requirements, or service benchmarks",
"service_assurances": "Find warranties, guarantees, representations, or assurances about service quality, performance, or outcomes. Look for warranty clauses, service guarantees, or performance assurances"
},
"compliance_with_laws": {
"relevant_laws": "Search for specific laws, regulations, statutes, or legal requirements that parties must comply with. Look for regulatory compliance, legal standards, or specific legislation mentioned in the contract",
"owner_obligations": "Find legal obligations, compliance responsibilities, or regulatory duties that each party must fulfill. Look for compliance requirements, legal duties, or regulatory obligations"
},
"amendments_versions": {
"change_management": "Look for how contract changes are managed - amendment procedures, modification processes, or change control mechanisms. Search for how the contract can be updated, modified, or amended",
"written_consent": "Find requirements for written consent, signatures, or formal approval needed for contract changes. Look for amendment approval processes or consent requirements for modifications"
},
"assignment_subcontracting": {
"delegation_assignment": "Search for rules about assigning contract rights, delegating obligations, or transferring responsibilities to third parties. Look for assignment clauses, subcontracting permissions, or restrictions on transferring contract duties. Find phrases like 'may not assign', 'assignment requires consent', or subcontracting limitations"
}
}
IMPORTANT: Return ONLY valid JSON. Do not include any explanatory text before or after the JSON.
"""
# Global service instance
contract_summary_service = ContractSummaryService()

View file

@ -0,0 +1,209 @@
import os
import uuid
import shutil
from typing import List, Dict, Any, Optional
from pathlib import Path
import aiofiles
from fastapi import UploadFile, HTTPException
from motor.motor_asyncio import AsyncIOMotorDatabase
from ..config.settings import settings
from ..models.document import DocumentCreate, DocumentInDB
from ..models.user import UserInDB
from ..utils.file_utils import validate_file, get_file_info
class DocumentProcessor:
def __init__(self):
self.upload_dir = Path(settings.upload_dir)
self.allowed_extensions = {
'.pdf', '.docx', '.doc', '.txt', '.csv', '.json', '.html', '.md', '.rtf'
}
self.max_file_size = 50 * 1024 * 1024 # 50MB
async def process_upload(
self,
file: UploadFile,
index_id: str,
user: UserInDB,
db: AsyncIOMotorDatabase
) -> DocumentInDB:
"""Process uploaded file and save to database"""
# Validate file
await self._validate_file(file)
# Generate unique filename
file_extension = Path(file.filename).suffix.lower()
unique_filename = f"{uuid.uuid4()}{file_extension}"
# Create index-specific upload directory
index_upload_dir = self.upload_dir / index_id
index_upload_dir.mkdir(parents=True, exist_ok=True)
# Save file
file_path = index_upload_dir / unique_filename
await self._save_file(file, file_path)
# Create document record
document_data = DocumentCreate(
filename=unique_filename,
original_filename=file.filename,
file_size=file.size,
content_type=file.content_type,
index_id=index_id,
uploaded_by=user.id
)
document_dict = document_data.dict()
document_dict["file_path"] = str(file_path)
document_dict["processing_status"] = "pending"
document_dict["embedding_status"] = "pending"
document_dict["metadata"] = {}
document_dict["chunk_count"] = 0
# Save to database
result = await db.documents.insert_one(document_dict)
document_dict["_id"] = result.inserted_id
return DocumentInDB(**document_dict)
async def _validate_file(self, file: UploadFile):
"""Validate uploaded file"""
if not file.filename:
raise HTTPException(status_code=400, detail="No file provided")
# Check file extension
file_extension = Path(file.filename).suffix.lower()
if file_extension not in self.allowed_extensions:
raise HTTPException(
status_code=400,
detail=f"File type {file_extension} not supported. Allowed types: {', '.join(self.allowed_extensions)}"
)
# Check file size
if file.size > self.max_file_size:
raise HTTPException(
status_code=400,
detail=f"File too large. Maximum size: {self.max_file_size / (1024*1024):.1f}MB"
)
async def _save_file(self, file: UploadFile, file_path: Path):
"""Save uploaded file to disk"""
try:
async with aiofiles.open(file_path, 'wb') as f:
content = await file.read()
await f.write(content)
except Exception as e:
raise HTTPException(
status_code=500,
detail=f"Error saving file: {str(e)}"
)
async def delete_document(
self,
document_id: str,
db: AsyncIOMotorDatabase
) -> bool:
"""Delete document and associated file with complete cleanup"""
from bson import ObjectId
# Get document
document = await db.documents.find_one({"_id": ObjectId(document_id)})
if not document:
return False
index_id = document["index_id"]
# Delete embeddings from vector store
try:
from .llama_processor import llama_processor
await llama_processor.delete_document_embeddings(document_id, index_id)
except Exception as e:
print(f"Error deleting embeddings for document {document_id}: {e}")
# Delete file
file_path = Path(document["file_path"])
if file_path.exists():
try:
file_path.unlink()
except Exception as e:
print(f"Error deleting file {file_path}: {e}")
# Note: Cache clearing removed - caching is disabled for data freshness
# Delete document record
result = await db.documents.delete_one({"_id": ObjectId(document_id)})
return result.deleted_count > 0
async def get_documents_by_index(
self,
index_id: str,
db: AsyncIOMotorDatabase
) -> List[DocumentInDB]:
"""Get all documents for an index"""
documents = []
cursor = db.documents.find({"index_id": index_id})
async for doc in cursor:
documents.append(DocumentInDB(**doc))
return documents
async def update_processing_status(
self,
document_id: str,
status: str,
metadata: Optional[Dict[str, Any]] = None,
db: AsyncIOMotorDatabase = None
):
"""Update document processing status"""
from bson import ObjectId
from datetime import datetime
update_data = {
"processing_status": status,
"updated_at": datetime.utcnow()
}
if metadata:
update_data["metadata"] = metadata
# Store parsed text if available
if "parsed_text" in metadata:
update_data["parsed_text"] = metadata["parsed_text"]
if "chunk_count" in metadata:
update_data["chunk_count"] = metadata["chunk_count"]
await db.documents.update_one(
{"_id": ObjectId(document_id)},
{"$set": update_data}
)
async def update_embedding_status(
self,
document_id: str,
status: str,
metadata: Optional[Dict[str, Any]] = None,
db: AsyncIOMotorDatabase = None
):
"""Update document embedding status"""
from bson import ObjectId
from datetime import datetime
update_data = {
"embedding_status": status,
"updated_at": datetime.utcnow()
}
if metadata:
# Store vector information if available
if "vector_ids" in metadata:
update_data["vector_ids"] = metadata["vector_ids"]
if "chunk_count" in metadata:
update_data["chunk_count"] = metadata["chunk_count"]
await db.documents.update_one(
{"_id": ObjectId(document_id)},
{"$set": update_data}
)
# Global processor instance
document_processor = DocumentProcessor()

View file

@ -0,0 +1,881 @@
import os
import uuid
import asyncio
from typing import List, Dict, Any, Optional, Union
from pathlib import Path
import aiofiles
from fastapi import UploadFile, HTTPException
from motor.motor_asyncio import AsyncIOMotorDatabase
from bson import ObjectId
from datetime import datetime
from llama_index.core import (
VectorStoreIndex,
StorageContext,
Settings,
Document as LlamaDocument
)
from llama_index.core.node_parser import SemanticSplitterNodeParser
from llama_index.core.embeddings import BaseEmbedding
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_cloud_services import LlamaParse
import chromadb
from chromadb.config import Settings as ChromaSettings
from chromadb.utils import embedding_functions
from ..config.settings import settings
from ..models.document import DocumentInDB, DocumentCreate
from ..core.chroma_client import chroma_singleton
from ..models.index import IndexInDB
from ..models.user import UserInDB
from ..utils.file_utils import validate_file, get_file_info
from ..services.contract_summary_service import contract_summary_service
class LlamaProcessor:
def __init__(self):
self.upload_dir = Path(settings.upload_dir)
self.indices_dir = Path(settings.indices_dir)
self.allowed_extensions = {
'.pdf', '.docx', '.doc', '.txt', '.csv', '.json', '.html', '.md', '.rtf'
}
self.max_file_size = 50 * 1024 * 1024 # 50MB
# Initialize LlamaIndex components
self._setup_llama_index()
# ChromaDB client managed by singleton
def get_chroma_client(self):
"""Get or create ChromaDB client using shared singleton"""
chroma_db_path = str(self.indices_dir / "chroma_db")
return chroma_singleton.get_client(chroma_db_path)
def _setup_llama_index(self):
"""Setup LlamaIndex components"""
# Configure OpenAI
Settings.llm = OpenAI(
model="gpt-4o",
api_key=settings.openai_api_key,
temperature=0.1
)
# Configure embeddings
Settings.embed_model = OpenAIEmbedding(
model="text-embedding-3-small",
api_key=settings.openai_api_key
)
# Configure semantic text splitter
Settings.text_splitter = SemanticSplitterNodeParser.from_defaults(
embed_model=OpenAIEmbedding(
model="text-embedding-3-small",
api_key=settings.openai_api_key
),
buffer_size=2,
breakpoint_percentile_threshold=70
)
async def process_single_file(
self,
file: UploadFile,
index_id: str,
user: UserInDB,
db: AsyncIOMotorDatabase,
custom_name: Optional[str] = None
) -> DocumentInDB:
"""Process a single uploaded file"""
# Validate file
await self._validate_file(file)
# Generate unique filename
file_extension = Path(file.filename).suffix.lower()
unique_filename = f"{uuid.uuid4()}{file_extension}"
# Create index-specific upload directory
index_upload_dir = self.upload_dir / index_id
index_upload_dir.mkdir(parents=True, exist_ok=True)
# Save file
file_path = index_upload_dir / unique_filename
await self._save_file(file, file_path)
# Create document record
document_name = custom_name or file.filename
document_data = DocumentCreate(
filename=unique_filename,
original_filename=document_name,
file_size=file.size,
content_type=file.content_type,
index_id=index_id,
uploaded_by=user.id
)
document_dict = document_data.dict()
document_dict["file_path"] = str(file_path)
document_dict["processing_status"] = "pending"
document_dict["embedding_status"] = "pending"
document_dict["summary_status"] = "pending"
document_dict["metadata"] = {}
document_dict["created_at"] = datetime.utcnow()
document_dict["updated_at"] = datetime.utcnow()
# Save to database
result = await db.documents.insert_one(document_dict)
document_dict["_id"] = result.inserted_id
document = DocumentInDB(**document_dict)
# Process document asynchronously
asyncio.create_task(self._process_document_async(document, db))
return document
async def process_multiple_files(
self,
files: List[UploadFile],
index_id: str,
user: UserInDB,
db: AsyncIOMotorDatabase,
base_name: str
) -> List[DocumentInDB]:
"""Process multiple uploaded files"""
documents = []
for i, file in enumerate(files, 1):
# Generate custom name with serial number
file_extension = Path(file.filename).suffix.lower()
custom_name = f"{base_name}_{i:03d}{file_extension}"
document = await self.process_single_file(
file, index_id, user, db, custom_name
)
documents.append(document)
return documents
async def _process_document_async(self, document: DocumentInDB, db: AsyncIOMotorDatabase):
"""Process document asynchronously"""
print(f"Starting processing for document {document.id}: {document.original_filename}")
try:
# Update status to processing
print(f"Setting document {document.id} to processing status")
await self._update_document_status(
document.id, "processing", "pending", "pending", db
)
# Small delay to ensure status update is committed
import asyncio
await asyncio.sleep(0.1)
# Parse document text
print(f"Parsing document text for {document.id}")
parsed_text = await self._parse_document_text(document.file_path)
print(f"Parsed text length: {len(parsed_text)} characters")
# Update parsing status
await self._update_document_status(
document.id, "completed", "processing", "pending", db
)
# Create text chunks
print(f"Creating text chunks for {document.id}")
chunks = await self._create_text_chunks(parsed_text, document.index_id)
print(f"Created {len(chunks)} chunks")
# Create embeddings and store in vector database
print(f"Creating embeddings for {document.id}")
vector_ids = await self._create_embeddings(
chunks, document.index_id, str(document.id)
)
print(f"Created {len(vector_ids)} embeddings")
# Update document with parsed data
print(f"Updating document {document.id} with parsed data")
await self._update_document_with_parsed_data(
document.id, parsed_text, chunks, vector_ids, db
)
# Update status to completed (core processing done)
print(f"Completing processing for document {document.id}")
await self._update_document_status(
document.id, "completed", "completed", "pending", db
)
# Extract contract summary (non-blocking)
print(f"Extracting contract summary for {document.id}")
try:
await self._extract_contract_summary(
document.id, parsed_text, document.original_filename, db
)
except Exception as summary_error:
print(f"Warning: Contract summary extraction failed for {document.id}: {summary_error}")
# Mark summary as failed but don't fail the entire document
await self._update_document_status(
document.id, "completed", "completed", "failed", db
)
print(f"Successfully processed document {document.id}")
except Exception as e:
print(f"Error processing document {document.id}: {str(e)}")
import traceback
traceback.print_exc()
await self._update_document_status(
document.id, "failed", "failed", "failed", db
)
# Store error in metadata
await db.documents.update_one(
{"_id": ObjectId(document.id)},
{"$set": {"metadata.error": str(e)}}
)
async def _parse_document_text(self, file_path: str) -> str:
"""Parse text from document using LlamaParse with premium mode (async)"""
file_path = Path(file_path)
print(f"Parsing file: {file_path}")
print(f"File exists: {file_path.exists()}")
print(f"File size: {file_path.stat().st_size if file_path.exists() else 'N/A'}")
# Check if LlamaParse API key is available
if not settings.llamaparse_api_key:
raise HTTPException(
status_code=500,
detail="LlamaParse API key is required for document processing"
)
try:
print("Using LlamaParse with premium mode (async)")
# Run LlamaParse in a thread pool to avoid blocking the event loop
def _run_llamaparse():
parser = LlamaParse(
api_key=settings.llamaparse_api_key,
premium_mode=True,
result_type="markdown",
verbose=True
)
return parser.load_data(str(file_path))
# Execute the synchronous LlamaParse call in a thread pool
import asyncio
loop = asyncio.get_event_loop()
documents = await loop.run_in_executor(None, _run_llamaparse)
print(f"LlamaParse loaded {len(documents)} documents from file")
# Combine all document text
full_text = ""
for doc in documents:
full_text += doc.text + "\n\n"
text_result = full_text.strip()
print(f"Final parsed text length: {len(text_result)} characters")
if len(text_result) == 0:
raise Exception("No text extracted from document via LlamaParse")
return text_result
except Exception as e:
print(f"Error in _parse_document_text with LlamaParse: {str(e)}")
raise HTTPException(
status_code=500,
detail=f"Error parsing document with LlamaParse: {str(e)}"
)
async def _create_text_chunks(self, text: str, index_id: str) -> List[str]:
"""Create text chunks using LlamaIndex SemanticSplitter"""
try:
print(f"Creating semantic chunks for text of length {len(text)}")
# Create semantic splitter with OpenAI embeddings
semantic_splitter = SemanticSplitterNodeParser.from_defaults(
embed_model=OpenAIEmbedding(
model="text-embedding-3-small",
api_key=settings.openai_api_key
),
buffer_size=2,
breakpoint_percentile_threshold=70
)
# Create a document and split it semantically
llama_doc = LlamaDocument(text=text)
nodes = semantic_splitter.get_nodes_from_documents([llama_doc])
# Extract text from nodes
chunks = [node.text for node in nodes]
print(f"Created {len(chunks)} semantic chunks")
if len(chunks) == 0:
raise Exception("No semantic chunks created from text")
return chunks
except Exception as e:
print(f"Error in _create_text_chunks: {str(e)}")
raise HTTPException(
status_code=500,
detail=f"Error creating semantic text chunks: {str(e)}"
)
async def _create_embeddings(
self,
chunks: List[str],
index_id: str,
document_id: str
) -> List[str]:
"""Create embeddings and store using LlamaIndex"""
try:
collection_name = f"index_{index_id}"
print(f"🔍 DEBUG - Creating embeddings using LlamaIndex for collection: {collection_name}")
# Create LlamaIndex documents from chunks
documents = []
vector_ids = []
for i, chunk in enumerate(chunks):
chunk_id = f"{document_id}_{i}"
vector_ids.append(chunk_id)
# Create LlamaIndex document with metadata
doc = LlamaDocument(
text=chunk,
metadata={
"document_id": document_id,
"chunk_index": i,
"index_id": index_id,
"chunk_id": chunk_id
}
)
documents.append(doc)
print(f" - Created {len(documents)} LlamaIndex documents")
print(f" - Document IDs: {vector_ids}")
print(f" - Document lengths: {[len(doc.text) for doc in documents]}")
# Get or create ChromaDB collection using LlamaIndex
client = self.get_chroma_client()
try:
chroma_collection = client.get_collection(name=collection_name)
current_count = chroma_collection.count()
print(f" - Found existing collection with {current_count} vectors")
except Exception:
# Create collection - let LlamaIndex handle the embedding function
print(f" - Creating new ChromaDB collection")
chroma_collection = client.create_collection(
name=collection_name,
metadata={"hnsw:space": "cosine"}
)
# Create LlamaIndex vector store and index
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# Check if index exists
try:
# Try to load existing index
index = VectorStoreIndex.from_vector_store(
vector_store=vector_store,
storage_context=storage_context
)
print(f" - Loaded existing LlamaIndex VectorStoreIndex")
except Exception:
# Create new index
index = VectorStoreIndex.from_documents(
[], # Start empty
storage_context=storage_context
)
print(f" - Created new LlamaIndex VectorStoreIndex")
# Insert documents into the index (async to avoid blocking)
print(f" - Inserting {len(documents)} documents into LlamaIndex")
def _insert_documents():
for doc in documents:
print(f" - Inserting document chunk with metadata: {doc.metadata}")
index.insert(doc)
return True
# Run embedding creation in thread pool to avoid blocking
await asyncio.get_event_loop().run_in_executor(None, _insert_documents)
# Verify the final count
final_count = chroma_collection.count()
print(f" - Collection count after adding: {final_count}")
print(f" - Successfully added {len(vector_ids)} vectors for document {document_id}")
return vector_ids
except Exception as e:
print(f"Error in _create_embeddings: {str(e)}")
import traceback
traceback.print_exc()
raise HTTPException(
status_code=500,
detail=f"Error creating embeddings: {str(e)}"
)
async def _update_document_status(
self,
document_id: str,
processing_status: str,
embedding_status: str,
summary_status: str,
db: AsyncIOMotorDatabase
):
"""Update document processing status"""
await db.documents.update_one(
{"_id": ObjectId(document_id)},
{"$set": {
"processing_status": processing_status,
"embedding_status": embedding_status,
"summary_status": summary_status,
"updated_at": datetime.utcnow()
}}
)
async def _update_document_with_parsed_data(
self,
document_id: str,
parsed_text: str,
chunks: List[str],
vector_ids: List[str],
db: AsyncIOMotorDatabase
):
"""Update document with parsed data"""
await db.documents.update_one(
{"_id": ObjectId(document_id)},
{"$set": {
"parsed_text": parsed_text,
"text_chunks": chunks,
"chunk_count": len(chunks),
"vector_ids": vector_ids,
"updated_at": datetime.utcnow()
}}
)
async def _extract_contract_summary(
self,
document_id: str,
parsed_text: str,
filename: str,
db: AsyncIOMotorDatabase
):
"""Extract contract summary asynchronously"""
try:
# Update summary status to processing
await db.documents.update_one(
{"_id": ObjectId(document_id)},
{"$set": {
"summary_status": "processing",
"updated_at": datetime.utcnow()
}}
)
# Extract summary using the contract summary service
result = await contract_summary_service.extract_contract_summary(
parsed_text, filename
)
if result["success"]:
# Validate the summary
validated_summary = contract_summary_service.validate_contract_summary(
result["summary"]
)
# Store in database
await db.documents.update_one(
{"_id": ObjectId(document_id)},
{"$set": {
"contract_summary": validated_summary.dict(),
"summary_status": "completed",
"summary_created_at": datetime.utcnow(),
"updated_at": datetime.utcnow()
}}
)
print(f"Successfully extracted contract summary for {document_id}")
else:
# Store error
await db.documents.update_one(
{"_id": ObjectId(document_id)},
{"$set": {
"summary_status": "failed",
"metadata.summary_error": result.get("error", "Unknown error"),
"updated_at": datetime.utcnow()
}}
)
print(f"Failed to extract contract summary for {document_id}: {result.get('error')}")
except Exception as e:
print(f"Error extracting contract summary for {document_id}: {str(e)}")
import traceback
traceback.print_exc()
await db.documents.update_one(
{"_id": ObjectId(document_id)},
{"$set": {
"summary_status": "failed",
"metadata.summary_error": str(e),
"updated_at": datetime.utcnow()
}}
)
async def _validate_file(self, file: UploadFile):
"""Validate uploaded file"""
if not file.filename:
raise HTTPException(status_code=400, detail="No file provided")
# Check file extension
file_extension = Path(file.filename).suffix.lower()
if file_extension not in self.allowed_extensions:
raise HTTPException(
status_code=400,
detail=f"File type {file_extension} not supported. Allowed types: {', '.join(self.allowed_extensions)}"
)
# Check file size
if file.size > self.max_file_size:
raise HTTPException(
status_code=400,
detail=f"File too large. Maximum size: {self.max_file_size / (1024*1024):.1f}MB"
)
async def _save_file(self, file: UploadFile, file_path: Path):
"""Save uploaded file to disk"""
try:
async with aiofiles.open(file_path, 'wb') as f:
content = await file.read()
await f.write(content)
except Exception as e:
raise HTTPException(
status_code=500,
detail=f"Error saving file: {str(e)}"
)
async def query_documents(
self,
query: str,
index_id: str,
top_k: int = 10
) -> List[Dict[str, Any]]:
"""Query documents using LlamaIndex VectorStoreIndex"""
try:
collection_name = f"index_{index_id}"
# Ensure consistent embedding model for LlamaIndex
Settings.embed_model = OpenAIEmbedding(
model="text-embedding-3-small",
api_key=settings.openai_api_key
)
# DEBUG: LlamaIndex query setup
print(f"🔍 DEBUG - LlamaIndex Query Execution:")
print(f" - Query: '{query}'")
print(f" - Collection name: {collection_name}")
print(f" - Top K: {top_k}")
print(f" - Settings.embed_model: {type(Settings.embed_model).__name__}")
# Test embedding dimensions
try:
test_embedding = Settings.embed_model.get_text_embedding("test")
print(f" - Query embedding dimensions: {len(test_embedding)}")
except Exception as e:
print(f" - ERROR getting test embedding: {e}")
# Check if collection exists (without specifying embedding function)
try:
client = self.get_chroma_client()
chroma_collection = client.get_collection(name=collection_name)
collection_count = chroma_collection.count()
print(f" - Found collection with {collection_count} documents")
if collection_count == 0:
raise HTTPException(
status_code=404,
detail=f"Index '{index_id}' exists but contains no processed documents. Please upload and process documents first."
)
except Exception as collection_error:
print(f" - Collection {collection_name} not found: {collection_error}")
raise HTTPException(
status_code=404,
detail=f"Vector collection for index '{index_id}' does not exist. Documents may still be processing or failed to process. Please check the admin panel for processing status."
)
# Create LlamaIndex VectorStore and Index
try:
print(f" - Creating LlamaIndex VectorStore...")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# Load the index
index = VectorStoreIndex.from_vector_store(
vector_store=vector_store,
storage_context=storage_context
)
# Create query engine
query_engine = index.as_query_engine(
similarity_top_k=top_k,
response_mode="no_text" # Only return source nodes, no generated text
)
print(f" - Executing LlamaIndex query...")
response = query_engine.query(query)
print(f" - Query successful")
# Extract and format results from source nodes
formatted_results = []
if hasattr(response, 'source_nodes') and response.source_nodes:
for i, node in enumerate(response.source_nodes):
# Get similarity score (higher is better)
similarity_score = node.score if hasattr(node, 'score') and node.score is not None else 0.5
# Convert similarity score to distance (lower is better)
# LlamaIndex typically returns scores between 0 and 1, where 1 is most similar
distance = 1.0 - similarity_score
formatted_results.append({
"content": node.text,
"metadata": node.metadata,
"score": similarity_score,
"distance": distance,
"document_id": node.metadata.get("document_id", "unknown"),
"chunk_index": node.metadata.get("chunk_index", i)
})
print(f" - Retrieved {len(formatted_results)} relevant chunks")
print(f" - Similarity scores: {[r['score'] for r in formatted_results]}")
print(f" - Distance values: {[r['distance'] for r in formatted_results]}")
else:
print(f" - No source nodes found in response")
except Exception as query_error:
print(f" - ERROR during LlamaIndex query: {query_error}")
print(f" - Error type: {type(query_error).__name__}")
raise query_error
if not formatted_results:
raise HTTPException(
status_code=404,
detail=f"No relevant documents found for query '{query}' in index '{index_id}'. Try rephrasing your question or ensure documents are properly processed."
)
return formatted_results
except HTTPException:
# Re-raise HTTPExceptions as-is
raise
except Exception as e:
print(f"Unexpected error in query_documents: {e}")
raise HTTPException(
status_code=500,
detail=f"Error querying documents: {str(e)}"
)
def generate_response(
self,
query: str,
context_chunks: List[str],
index_id: str
) -> str:
"""Generate response using OpenAI with retrieved context"""
try:
# Prepare context
context = "\n\n".join(context_chunks)
# Create prompt
prompt = f"""Based on the following context, answer the user's question. If the answer is not in the context, say "I don't have enough information to answer that question."
Context:
{context}
Question: {query}
Answer:"""
# Generate response using OpenAI
llm = OpenAI(
model="gpt-4o",
api_key=settings.openai_api_key,
temperature=0.1
)
# Use sync completion
response = llm.complete(prompt)
return response.text
except Exception as e:
raise HTTPException(
status_code=500,
detail=f"Error generating response: {str(e)}"
)
async def delete_document_embeddings(
self,
document_id: str,
index_id: str
):
"""Delete document embeddings from vector store"""
try:
collection_name = f"index_{index_id}"
print(f"🗑️ DEBUG - Deleting embeddings for document {document_id} from collection {collection_name}")
collection = self.get_chroma_client().get_collection(name=collection_name)
# Get collection count before deletion
count_before = collection.count()
print(f" - Collection count before deletion: {count_before}")
# Strategy 1: Try to delete by document_id metadata
try:
results = collection.get(where={"document_id": document_id})
print(f" - Strategy 1 - Found {len(results['ids']) if results['ids'] else 0} vectors by document_id")
if results["ids"]:
collection.delete(ids=results["ids"])
count_after = collection.count()
print(f" - Strategy 1 - Successfully deleted {count_before - count_after} vectors")
if count_before != count_after:
return # Success, exit early
except Exception as e:
print(f" - Strategy 1 failed: {e}")
# Strategy 2: Try to delete by chunk_id pattern (document_id_*)
try:
# Get all vectors and filter by chunk_id pattern
all_results = collection.get()
matching_ids = []
if all_results["ids"] and all_results["metadatas"]:
for vid, metadata in zip(all_results["ids"], all_results["metadatas"]):
if metadata and "chunk_id" in metadata:
if metadata["chunk_id"].startswith(f"{document_id}_"):
matching_ids.append(vid)
print(f" - Strategy 2 - Found {len(matching_ids)} vectors by chunk_id pattern")
if matching_ids:
collection.delete(ids=matching_ids)
count_after = collection.count()
print(f" - Strategy 2 - Successfully deleted {count_before - count_after} vectors")
if count_before != count_after:
return # Success, exit early
except Exception as e:
print(f" - Strategy 2 failed: {e}")
# Strategy 3: Try to delete by any metadata containing document_id
try:
all_results = collection.get()
matching_ids = []
if all_results["ids"] and all_results["metadatas"]:
for vid, metadata in zip(all_results["ids"], all_results["metadatas"]):
if metadata:
# Check if any metadata value contains the document_id
for key, value in metadata.items():
if str(value) == document_id:
matching_ids.append(vid)
break
print(f" - Strategy 3 - Found {len(matching_ids)} vectors by metadata scan")
if matching_ids:
collection.delete(ids=matching_ids)
count_after = collection.count()
print(f" - Strategy 3 - Successfully deleted {count_before - count_after} vectors")
if count_before != count_after:
return # Success, exit early
except Exception as e:
print(f" - Strategy 3 failed: {e}")
# If we get here, no vectors were deleted
print(f" - WARNING: No vectors were deleted for document {document_id}")
# Debug: Show some sample metadata to understand the structure
try:
sample_results = collection.get(limit=3)
print(f" - DEBUG: Sample metadata structures:")
for i, metadata in enumerate(sample_results.get("metadatas", [])[:3]):
print(f" - Sample {i}: {metadata}")
except Exception as e:
print(f" - DEBUG: Could not get sample metadata: {e}")
except Exception as e:
print(f"Error deleting embeddings for document {document_id}: {e}")
import traceback
traceback.print_exc()
def check_collection_exists(self, index_id: str) -> bool:
"""Check if ChromaDB collection exists for an index"""
try:
collection_name = f"index_{index_id}"
collection = self.get_chroma_client().get_collection(name=collection_name)
return True
except Exception:
return False
def get_collection_info(self, index_id: str) -> Dict[str, Any]:
"""Get information about a ChromaDB collection"""
try:
collection_name = f"index_{index_id}"
collection = self.get_chroma_client().get_collection(name=collection_name)
return {
"exists": True,
"name": collection_name,
"count": collection.count(),
"metadata": collection.metadata
}
except Exception as e:
return {
"exists": False,
"name": f"index_{index_id}",
"count": 0,
"error": str(e)
}
async def generate_document_summary(self, text: str, filename: str) -> str:
"""Generate AI summary of a document"""
try:
# Check text length and raise error if too long
if len(text) > settings.max_summary_chars:
error_msg = f"Document too large for summary: {len(text)} characters exceeds maximum of {settings.max_summary_chars} characters"
print(f"Error: {error_msg}")
return f"Error: {error_msg}"
# Create summarization prompt
prompt = f"""Please provide a concise summary of the following document "{filename}".
Focus on the main points, key information, and important details.
Keep the summary between 150-300 words.
Document content:
{text}
Summary:"""
# Generate summary using OpenAI via Settings.llm
response = await Settings.llm.acomplete(prompt)
summary = str(response).strip()
# Fallback if summary is too short
if len(summary) < 50:
summary = "Unable to generate detailed summary. This document contains text that may require manual review."
return summary
except Exception as e:
return f"Error generating summary: {str(e)}"
# Global processor instance
llama_processor = LlamaProcessor()

View file

@ -0,0 +1,303 @@
import os
import json
import asyncio
from typing import Dict, Any, List, Optional
from pathlib import Path
from datetime import datetime
from llama_index.core import VectorStoreIndex, StorageContext, Settings
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
import chromadb
from ..config.settings import settings
from ..models.index import IndexInDB
from ..core.chroma_client import chroma_singleton
class RAGService:
def __init__(self):
self.openai_api_key = settings.openai_api_key
self.llamaparse_api_key = settings.llamaparse_api_key
self.indices_dir = Path(settings.indices_dir)
self.upload_dir = Path(settings.upload_dir)
# Configure LlamaIndex settings
Settings.llm = OpenAI(
model="gpt-4o",
api_key=self.openai_api_key,
temperature=0.1
)
Settings.embed_model = OpenAIEmbedding(
model="text-embedding-3-small",
api_key=self.openai_api_key
)
# Ensure directories exist
self.indices_dir.mkdir(parents=True, exist_ok=True)
def get_chroma_client(self):
"""Get or create ChromaDB client using shared singleton"""
chroma_db_path = str(self.indices_dir / "chroma_db")
return chroma_singleton.get_client(chroma_db_path)
# NOTE: Index creation is now handled by LlamaProcessor
# This method is deprecated and should not be used
async def query_index(
self,
index_id: str,
query: str,
top_k: int = 10
) -> Dict[str, Any]:
"""Query an existing index"""
try:
index_dir = self.indices_dir / index_id
if not index_dir.exists():
return {
"success": False,
"message": f"Index {index_id} not found"
}
# Ensure consistent embedding model before querying
embedding_model = OpenAIEmbedding(
model="text-embedding-3-small",
api_key=self.openai_api_key
)
Settings.embed_model = embedding_model
# DEBUG: Log embedding model details
print(f"🔍 DEBUG - Embedding Model Configuration:")
print(f" - Model: text-embedding-3-small")
print(f" - API Key present: {bool(self.openai_api_key)}")
print(f" - Settings.embed_model: {type(Settings.embed_model).__name__}")
# Test embedding to get dimensions
try:
test_embedding = embedding_model.get_text_embedding("test")
print(f" - Test embedding dimensions: {len(test_embedding)}")
except Exception as e:
print(f" - ERROR getting test embedding: {e}")
# Load index (use consistent collection naming)
chroma_client = self.get_chroma_client()
collection_name = f"index_{index_id}"
# DEBUG: ChromaDB collection info
try:
chroma_collection = chroma_client.get_collection(name=collection_name)
collection_metadata = chroma_collection.metadata
collection_count = chroma_collection.count()
print(f"🔍 DEBUG - ChromaDB Collection:")
print(f" - Collection name: {collection_name}")
print(f" - Document count: {collection_count}")
print(f" - Collection metadata: {collection_metadata}")
# Try to peek at a few vectors to check dimensions
if collection_count > 0:
peek_result = chroma_collection.peek(limit=1)
if peek_result and 'embeddings' in peek_result and peek_result['embeddings']:
stored_dim = len(peek_result['embeddings'][0]) if peek_result['embeddings'][0] else "None"
print(f" - Stored vector dimensions: {stored_dim}")
else:
print(f" - No embeddings found in peek result")
except Exception as e:
print(f" - ERROR accessing collection: {e}")
raise e
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# Load the index
index = VectorStoreIndex.from_vector_store(
vector_store=vector_store,
storage_context=storage_context
)
# Create query engine
query_engine = index.as_query_engine(
similarity_top_k=top_k,
response_mode="compact"
)
# DEBUG: Query execution details
print(f"🔍 DEBUG - Query Execution:")
print(f" - Query: '{query}'")
print(f" - Top K: {top_k}")
print(f" - Current Settings.embed_model: {type(Settings.embed_model).__name__}")
# Test query embedding before execution
try:
query_embedding = Settings.embed_model.get_text_embedding(query)
print(f" - Query embedding dimensions: {len(query_embedding)}")
except Exception as e:
print(f" - ERROR getting query embedding: {e}")
# Execute query
start_time = datetime.now()
print(f" - Starting query execution at {start_time}")
try:
response = query_engine.query(query)
print(f" - Query executed successfully")
except Exception as e:
print(f" - ERROR during query execution: {e}")
print(f" - Error type: {type(e).__name__}")
raise e
end_time = datetime.now()
# Extract source information
source_info = []
if hasattr(response, 'source_nodes'):
for node in response.source_nodes:
source_info.append({
"filename": node.metadata.get('filename', 'Unknown'),
"score": node.score,
"text_snippet": node.text[:200] + "..." if len(node.text) > 200 else node.text
})
return {
"success": True,
"response": str(response),
"sources": source_info,
"query_time": (end_time - start_time).total_seconds(),
"debug": {
"query": query,
"index_id": index_id,
"top_k": top_k,
"source_count": len(source_info)
}
}
except Exception as e:
print(f"Error querying index: {e}")
return {
"success": False,
"message": f"Error querying index: {str(e)}"
}
# NOTE: Document loading is now handled by LlamaProcessor
# This method is deprecated and should not be used
# NOTE: LlamaParse processing is now handled by LlamaProcessor
# This method is deprecated and should not be used
# NOTE: Document reading is now handled by LlamaProcessor
# This method is deprecated and should not be used
# NOTE: Document addition to index is now handled by LlamaProcessor
# This method is deprecated and should not be used
async def delete_index(self, index_id: str) -> bool:
"""Delete an index and all associated files"""
try:
index_dir = self.indices_dir / index_id
if index_dir.exists():
shutil.rmtree(index_dir)
return True
return False
except Exception as e:
print(f"Error deleting index: {e}")
return False
async def delete_index_complete(self, index_id: str) -> Dict[str, Any]:
"""Complete index deletion including ChromaDB cleanup"""
try:
# Delete vector index files
file_success = await self.delete_index(index_id)
# Delete ChromaDB collection
chroma_client = self.get_chroma_client()
collection_name = f"index_{index_id}"
collection_deleted = False
try:
chroma_client.delete_collection(collection_name)
collection_deleted = True
print(f"Successfully deleted ChromaDB collection: {collection_name}")
except Exception as e:
print(f"Warning: Could not delete ChromaDB collection {collection_name}: {e}")
# Clean up orphaned metadata in shared ChromaDB database
metadata_cleaned = self._cleanup_chromadb_metadata(index_id)
return {
"success": True,
"message": "Index completely deleted",
"details": {
"files_deleted": file_success,
"collection_deleted": collection_deleted,
"metadata_cleaned": metadata_cleaned
}
}
except Exception as e:
return {
"success": False,
"message": f"Error during complete index deletion: {str(e)}"
}
def _cleanup_chromadb_metadata(self, index_id: str) -> bool:
"""Clean up orphaned metadata in ChromaDB SQLite database for specific index"""
import sqlite3
chroma_db_path = str(self.indices_dir / "chroma_db" / "chroma.sqlite3")
collection_name = f"index_{index_id}"
try:
with sqlite3.connect(chroma_db_path) as conn:
cursor = conn.cursor()
# Get the collection_id first
cursor.execute("""
SELECT id FROM collections WHERE name = ?
""", (collection_name,))
collection_result = cursor.fetchone()
if collection_result:
collection_id = collection_result[0]
# Delete embedding metadata for this specific collection
cursor.execute("""
DELETE FROM embedding_metadata
WHERE id IN (
SELECT em.id FROM embedding_metadata em
JOIN embeddings e ON em.id = e.id
WHERE e.collection_id = ?
)
""", (collection_id,))
metadata_count = cursor.rowcount
# Delete embeddings for this collection
cursor.execute("""
DELETE FROM embeddings
WHERE collection_id = ?
""", (collection_id,))
embedding_count = cursor.rowcount
# Delete the collection record
cursor.execute("""
DELETE FROM collections
WHERE id = ?
""", (collection_id,))
conn.commit()
print(f"Cleaned up ChromaDB metadata for {collection_name}: {metadata_count} metadata entries, {embedding_count} embeddings")
return True
else:
print(f"No collection found with name {collection_name}")
return True # Not an error if collection doesn't exist
except Exception as e:
print(f"Warning: Could not clean up ChromaDB metadata for {collection_name}: {e}")
return False
def index_exists(self, index_id: str) -> bool:
"""Check if an index exists"""
index_dir = self.indices_dir / index_id
return index_dir.exists()
# Global RAG service instance
rag_service = RAGService()

View file

@ -0,0 +1,234 @@
"""
SSO Service for Azure AD token validation and user management
"""
import httpx
import json
from typing import Optional, Dict, Any
from datetime import datetime
from jose import jwt, JWTError
from fastapi import HTTPException, status
from app.config.settings import settings
from app.models.user import UserInDB, UserCreate, AuthMethod, UserRole
from app.config.database import get_database
from app.core.security import get_password_hash
import logging
logger = logging.getLogger(__name__)
class SSOService:
def __init__(self):
self.authority = settings.azure_authority
self.client_id = settings.azure_client_id
self.tenant_id = settings.azure_tenant_id
self._discovery_cache = {}
self._jwks_cache = {}
async def get_discovery_document(self) -> Dict[str, Any]:
"""Get Azure AD OAuth2 endpoints for MSAL token validation"""
if not self.authority:
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="Azure AD authority not configured"
)
# Standard Azure AD v2.0 endpoints for MSAL
discovery_doc = {
"authorization_endpoint": f"{self.authority}/oauth2/v2.0/authorize",
"token_endpoint": f"{self.authority}/oauth2/v2.0/token",
"jwks_uri": f"{self.authority}/discovery/v2.0/keys",
"issuer": f"https://login.microsoftonline.com/{self.tenant_id}/v2.0"
}
return discovery_doc
async def get_jwks(self) -> Dict[str, Any]:
"""Get JSON Web Key Set from Azure AD"""
discovery_doc = await self.get_discovery_document()
jwks_uri = discovery_doc.get("jwks_uri")
if not jwks_uri:
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="JWKS URI not found in discovery document"
)
if jwks_uri in self._jwks_cache:
return self._jwks_cache[jwks_uri]
try:
async with httpx.AsyncClient() as client:
response = await client.get(jwks_uri)
response.raise_for_status()
jwks = response.json()
self._jwks_cache[jwks_uri] = jwks
return jwks
except Exception as e:
logger.error(f"Failed to get JWKS: {e}")
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="Failed to validate token: Cannot retrieve signing keys"
)
async def validate_token(self, token: str) -> Dict[str, Any]:
"""Validate Azure AD access token"""
try:
# Log token details for debugging
logger.info(f"Validating token...")
unverified_payload = jwt.get_unverified_claims(token)
logger.info(f"Token payload: {unverified_payload}")
logger.info(f"Token issuer: {unverified_payload.get('iss')}")
logger.info(f"Token audience: {unverified_payload.get('aud')}")
logger.info(f"Expected audience: {self.client_id}")
logger.info(f"Expected issuer: https://login.microsoftonline.com/{self.tenant_id}/v2.0")
# Get JWKS for token validation
jwks = await self.get_jwks()
# Decode token header to get key ID
unverified_header = jwt.get_unverified_header(token)
kid = unverified_header.get("kid")
if not kid:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Token missing key ID"
)
# Find the correct key
key = None
for jwk in jwks.get("keys", []):
if jwk.get("kid") == kid:
key = jwk
break
if not key:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Token key not found in JWKS"
)
# Validate MSAL ID token with proper v2.0 validation
payload = jwt.decode(
token,
key,
algorithms=["RS256"],
audience=self.client_id,
issuer=f"https://login.microsoftonline.com/{self.tenant_id}/v2.0"
)
logger.info(f"Token validation successful: {payload}")
return payload
except JWTError as e:
logger.error(f"JWT validation error: {e}")
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail=f"Invalid token: {str(e)}"
)
except Exception as e:
logger.error(f"Token validation error: {e}")
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Token validation failed: {str(e)}"
)
def extract_user_info(self, token_payload: Dict[str, Any]) -> Dict[str, Any]:
"""Extract user information from token payload"""
return {
"sso_user_id": token_payload.get("sub"),
"sso_email": token_payload.get("email", token_payload.get("preferred_username")),
"sso_name": token_payload.get("name"),
"email": token_payload.get("email", token_payload.get("preferred_username")),
"tenant_id": token_payload.get("tid"),
"app_id": token_payload.get("appid"),
}
async def get_or_create_user(self, token_payload: Dict[str, Any]) -> UserInDB:
"""Get existing SSO user or create new one"""
user_info = self.extract_user_info(token_payload)
sso_user_id = user_info.get("sso_user_id")
email = user_info.get("email")
if not sso_user_id or not email:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Token missing required user information"
)
db = get_database()
users_collection = db["users"]
# First try to find by SSO user ID
user_doc = await users_collection.find_one({"sso_user_id": sso_user_id})
# If not found, try to find by email (for existing local users upgrading to SSO)
if not user_doc:
user_doc = await users_collection.find_one({"email": email})
if user_doc:
# Update existing user with SSO info and last login
update_data = {
"auth_method": AuthMethod.SSO,
"sso_provider": "azure",
"sso_user_id": sso_user_id,
"sso_email": user_info.get("sso_email"),
"sso_name": user_info.get("sso_name"),
"sso_attributes": token_payload,
"last_sso_login": datetime.utcnow(),
"updated_at": datetime.utcnow()
}
await users_collection.update_one(
{"_id": user_doc["_id"]},
{"$set": update_data}
)
# Fetch updated user
user_doc = await users_collection.find_one({"_id": user_doc["_id"]})
return UserInDB(**user_doc)
else:
# Create new SSO user with minimal access
new_user = UserInDB(
email=email,
role=UserRole.USER, # New SSO users default to 'user' role
is_active=True,
auth_method=AuthMethod.SSO,
sso_provider="azure",
sso_user_id=sso_user_id,
sso_email=user_info.get("sso_email"),
sso_name=user_info.get("sso_name"),
sso_attributes=token_payload,
last_sso_login=datetime.utcnow(),
index_access=[], # No index access initially
created_at=datetime.utcnow(),
updated_at=datetime.utcnow(),
hashed_password=None # SSO users don't need passwords
)
# Insert new user
result = await users_collection.insert_one(new_user.model_dump(by_alias=True, exclude={"id"}))
# Fetch created user
user_doc = await users_collection.find_one({"_id": result.inserted_id})
logger.info(f"Created new SSO user: {email}")
return UserInDB(**user_doc)
async def process_sso_login(self, access_token: str) -> UserInDB:
"""Complete SSO login process"""
# Validate the access token
token_payload = await self.validate_token(access_token)
# Get or create user
user = await self.get_or_create_user(token_payload)
# Check if user is active
if not user.is_active:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="User account is disabled"
)
return user
# Create singleton instance
sso_service = SSOService()

View file

@ -0,0 +1,9 @@
from .file_utils import validate_file, get_file_info, ensure_directory, clean_filename, get_upload_path
__all__ = [
"validate_file",
"get_file_info",
"ensure_directory",
"clean_filename",
"get_upload_path"
]

View file

@ -0,0 +1,87 @@
import os
import mimetypes
from pathlib import Path
from typing import Dict, Any, Optional
from fastapi import UploadFile
def validate_file(file: UploadFile) -> Dict[str, Any]:
"""Validate uploaded file and return file info"""
if not file.filename:
raise ValueError("No filename provided")
# Get file extension
file_path = Path(file.filename)
extension = file_path.suffix.lower()
# Get MIME type
mime_type, _ = mimetypes.guess_type(file.filename)
# Validate MIME type
allowed_mime_types = {
'application/pdf',
'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
'application/msword',
'text/plain',
'text/csv',
'application/json',
'text/html',
'text/markdown',
'application/rtf'
}
if mime_type not in allowed_mime_types:
raise ValueError(f"MIME type {mime_type} not supported")
return {
'filename': file.filename,
'extension': extension,
'mime_type': mime_type,
'size': file.size
}
def get_file_info(file_path: Path) -> Dict[str, Any]:
"""Get information about a file"""
if not file_path.exists():
raise FileNotFoundError(f"File {file_path} not found")
stat = file_path.stat()
mime_type, _ = mimetypes.guess_type(str(file_path))
return {
'filename': file_path.name,
'extension': file_path.suffix.lower(),
'mime_type': mime_type,
'size': stat.st_size,
'created_at': stat.st_ctime,
'modified_at': stat.st_mtime
}
def ensure_directory(directory: Path) -> None:
"""Ensure directory exists"""
directory.mkdir(parents=True, exist_ok=True)
def clean_filename(filename: str) -> str:
"""Clean filename to be filesystem-safe"""
# Remove or replace problematic characters
invalid_chars = '<>:"/\\|?*'
cleaned = filename
for char in invalid_chars:
cleaned = cleaned.replace(char, '_')
# Remove leading/trailing dots and spaces
cleaned = cleaned.strip('. ')
# Ensure it's not empty
if not cleaned:
cleaned = "unnamed_file"
return cleaned
def get_upload_path(index_id: str, filename: str, base_dir: str) -> Path:
"""Generate upload path for a file"""
base_path = Path(base_dir)
index_path = base_path / index_id
ensure_directory(index_path)
return index_path / clean_filename(filename)

View file

@ -0,0 +1,40 @@
services:
app:
build: .
ports:
- "8001:8000"
environment:
- MONGODB_URL=mongodb://mongo:27017
- REDIS_URL=redis://redis:6379
- DATABASE_NAME=contract_analysis
depends_on:
- mongo
- redis
volumes:
- ./uploads:/app/uploads
- ./indices:/app/indices
- ./.env:/app/.env
command: uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
mongo:
image: mongo:7
ports:
- "27017:27017"
volumes:
- mongo_data:/data/db
environment:
MONGO_INITDB_DATABASE: contract_analysis
MONGO_INITDB_ROOT_USERNAME: netflix
MONGO_INITDB_ROOT_PASSWORD: netflix
command: mongod --auth
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis_data:/data
volumes:
mongo_data:
redis_data:

View file

@ -0,0 +1,77 @@
#!/usr/bin/env python3
"""
Test script to verify chat fixes are working correctly.
Run this after starting the backend server.
"""
import asyncio
import sys
import os
# Add the app directory to Python path
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'app'))
from motor.motor_asyncio import AsyncIOMotorClient
from datetime import datetime
from services.llama_processor import llama_processor
from services.chat_context_service import chat_context_service
async def test_chat_fixes():
print("🧪 Testing Chat Fixes...")
# Test 1: Check if LlamaProcessor methods work
print("\n1. Testing LlamaProcessor collection methods...")
test_index_id = "test-index-123"
# Test collection existence check
exists = llama_processor.check_collection_exists(test_index_id)
print(f" Collection exists: {exists}")
# Test collection info
info = llama_processor.get_collection_info(test_index_id)
print(f" Collection info: {info}")
# Test 2: Check MongoDB connection
print("\n2. Testing MongoDB connection...")
try:
client = AsyncIOMotorClient("mongodb://localhost:27017")
db = client.contract_analysis
# Test document count
doc_count = await db.documents.count_documents({})
print(f" Total documents in DB: {doc_count}")
# Test indices count
indices_count = await db.indices.count_documents({})
print(f" Total indices in DB: {indices_count}")
await client.close()
except Exception as e:
print(f" MongoDB connection failed: {e}")
# Test 3: Test timestamp generation
print("\n3. Testing timestamp generation...")
now = datetime.utcnow()
print(f" Current UTC timestamp: {now}")
print(f" Formatted: {now.strftime('%Y-%m-%d %H:%M:%S')}")
# Test 4: Test context service
print("\n4. Testing context service...")
try:
# Test context formatting
test_messages = [
{"query": "What is this document about?", "response": "This document is about contracts.", "created_at": now},
{"query": "Tell me more", "response": "It contains legal terms and conditions.", "created_at": now}
]
formatted_context = chat_context_service.format_context_for_ai(test_messages)
print(f" Context formatted successfully: {len(formatted_context)} characters")
except Exception as e:
print(f" Context service test failed: {e}")
print("\n✅ Chat fixes test completed!")
if __name__ == "__main__":
asyncio.run(test_chat_fixes())

31
contract-query.service Normal file
View file

@ -0,0 +1,31 @@
[Unit]
Description=Contract Query Backend API
After=network.target
Wants=network.target
[Service]
Type=simple
User=www-data
Group=www-data
WorkingDirectory=/var/www/html/contract-query/backend
Environment=PATH=/var/www/html/contract-query/backend/venv/bin
ExecStart=/var/www/html/contract-query/backend/venv/bin/uvicorn app.main:app --host 0.0.0.0 --port 8001
ExecReload=/bin/kill -HUP $MAINPID
KillMode=mixed
Restart=always
RestartSec=5
StandardOutput=journal
StandardError=journal
# Security settings
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=false
ProtectHome=false
ReadWritePaths=/var/www/html/contract-query/backend
# Environment variables
EnvironmentFile=/var/www/html/contract-query/backend/.env
[Install]
WantedBy=multi-user.target

1183
contracts_documentation.md Normal file

File diff suppressed because it is too large Load diff

2
frontend/.env.example Normal file
View file

@ -0,0 +1,2 @@
VITE_API_URL=http://localhost:8000
VITE_APP_NAME=Contract Analysis Tool

20
frontend/.eslintrc.js Normal file
View file

@ -0,0 +1,20 @@
module.exports = {
root: true,
env: { browser: true, es2020: true },
extends: [
'eslint:recommended',
'plugin:react/recommended',
'plugin:react/jsx-runtime',
'plugin:react-hooks/recommended',
],
ignorePatterns: ['dist', '.eslintrc.js'],
parserOptions: { ecmaVersion: 'latest', sourceType: 'module' },
settings: { react: { version: '18.2' } },
plugins: ['react-refresh'],
rules: {
'react-refresh/only-export-components': [
'warn',
{ allowConstantExport: true },
],
},
}

16
frontend/index.html Normal file
View file

@ -0,0 +1,16 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<link rel="icon" type="image/svg+xml" href="/vite.svg" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Montserrat:ital,wght@0,100..900;1,100..900&display=swap" rel="stylesheet">
<title>Contract Analysis Tool</title>
</head>
<body>
<div id="root"></div>
<script type="module" src="/src/main.jsx"></script>
</body>
</html>

1
frontend/node_modules/.bin/acorn generated vendored Symbolic link
View file

@ -0,0 +1 @@
../acorn/bin/acorn

1
frontend/node_modules/.bin/autoprefixer generated vendored Symbolic link
View file

@ -0,0 +1 @@
../autoprefixer/bin/autoprefixer

1
frontend/node_modules/.bin/browserslist generated vendored Symbolic link
View file

@ -0,0 +1 @@
../browserslist/cli.js

1
frontend/node_modules/.bin/cssesc generated vendored Symbolic link
View file

@ -0,0 +1 @@
../cssesc/bin/cssesc

1
frontend/node_modules/.bin/esbuild generated vendored Symbolic link
View file

@ -0,0 +1 @@
../esbuild/bin/esbuild

1
frontend/node_modules/.bin/eslint generated vendored Symbolic link
View file

@ -0,0 +1 @@
../eslint/bin/eslint.js

1
frontend/node_modules/.bin/jiti generated vendored Symbolic link
View file

@ -0,0 +1 @@
../jiti/bin/jiti.js

1
frontend/node_modules/.bin/js-yaml generated vendored Symbolic link
View file

@ -0,0 +1 @@
../js-yaml/bin/js-yaml.js

1
frontend/node_modules/.bin/jsesc generated vendored Symbolic link
View file

@ -0,0 +1 @@
../jsesc/bin/jsesc

1
frontend/node_modules/.bin/json5 generated vendored Symbolic link
View file

@ -0,0 +1 @@
../json5/lib/cli.js

1
frontend/node_modules/.bin/loose-envify generated vendored Symbolic link
View file

@ -0,0 +1 @@
../loose-envify/cli.js

1
frontend/node_modules/.bin/nanoid generated vendored Symbolic link
View file

@ -0,0 +1 @@
../nanoid/bin/nanoid.cjs

1
frontend/node_modules/.bin/node-which generated vendored Symbolic link
View file

@ -0,0 +1 @@
../which/bin/node-which

1
frontend/node_modules/.bin/parser generated vendored Symbolic link
View file

@ -0,0 +1 @@
../@babel/parser/bin/babel-parser.js

1
frontend/node_modules/.bin/resolve generated vendored Symbolic link
View file

@ -0,0 +1 @@
../resolve/bin/resolve

1
frontend/node_modules/.bin/rimraf generated vendored Symbolic link
View file

@ -0,0 +1 @@
../rimraf/bin.js

1
frontend/node_modules/.bin/rollup generated vendored Symbolic link
View file

@ -0,0 +1 @@
../rollup/dist/bin/rollup

1
frontend/node_modules/.bin/semver generated vendored Symbolic link
View file

@ -0,0 +1 @@
../semver/bin/semver.js

1
frontend/node_modules/.bin/showdown generated vendored Symbolic link
View file

@ -0,0 +1 @@
../showdown/bin/showdown.js

1
frontend/node_modules/.bin/sucrase generated vendored Symbolic link
View file

@ -0,0 +1 @@
../sucrase/bin/sucrase

1
frontend/node_modules/.bin/sucrase-node generated vendored Symbolic link
View file

@ -0,0 +1 @@
../sucrase/bin/sucrase-node

1
frontend/node_modules/.bin/tailwind generated vendored Symbolic link
View file

@ -0,0 +1 @@
../tailwindcss/lib/cli.js

1
frontend/node_modules/.bin/tailwindcss generated vendored Symbolic link
View file

@ -0,0 +1 @@
../tailwindcss/lib/cli.js

1
frontend/node_modules/.bin/update-browserslist-db generated vendored Symbolic link
View file

@ -0,0 +1 @@
../update-browserslist-db/cli.js

1
frontend/node_modules/.bin/vite generated vendored Symbolic link
View file

@ -0,0 +1 @@
../vite/bin/vite.js

1
frontend/node_modules/.bin/yaml generated vendored Symbolic link
View file

@ -0,0 +1 @@
../yaml/bin.mjs

6340
frontend/node_modules/.package-lock.json generated vendored Normal file

File diff suppressed because it is too large Load diff

126
frontend/node_modules/.vite/deps/@azure_msal-browser.js generated vendored Normal file
View file

@ -0,0 +1,126 @@
import {
AccountEntity,
ApiId,
AuthError,
AuthErrorCodes_exports,
AuthErrorMessage,
AuthenticationHeaderParser,
AuthenticationScheme,
AzureCloudInstance,
BrowserAuthError,
BrowserAuthErrorCodes_exports,
BrowserAuthErrorMessage,
BrowserCacheLocation,
BrowserConfigurationAuthError,
BrowserConfigurationAuthErrorCodes_exports,
BrowserConfigurationAuthErrorMessage,
BrowserPerformanceClient,
BrowserUtils_exports,
CacheLookupPolicy,
ClientAuthError,
ClientAuthErrorCodes_exports,
ClientAuthErrorMessage,
ClientConfigurationError,
ClientConfigurationErrorCodes_exports,
ClientConfigurationErrorMessage,
DEFAULT_IFRAME_TIMEOUT_MS,
EventHandler,
EventMessageUtils,
EventType,
InteractionRequiredAuthError,
InteractionRequiredAuthErrorCodes_exports,
InteractionRequiredAuthErrorMessage,
InteractionStatus,
InteractionType,
JsonWebTokenTypes,
LocalStorage,
LogLevel,
Logger,
MemoryStorage,
NavigationClient,
OIDC_DEFAULT_SCOPES,
PerformanceEvents,
PromptValue,
ProtocolMode,
PublicClientApplication,
PublicClientNext,
ServerError,
ServerResponseType,
SessionStorage,
SignedHttpRequest,
StringUtils,
StubPerformanceClient,
UrlString,
WrapperSKU,
createNestablePublicClientApplication,
createStandardPublicClientApplication,
isPlatformBrokerAvailable,
stubbedPublicClientApplication,
version
} from "./chunk-DRYC2OA6.js";
import {
BrowserPerformanceMeasurement
} from "./chunk-U543NHT4.js";
import "./chunk-G3PMV62Z.js";
export {
AccountEntity,
ApiId,
AuthError,
AuthErrorCodes_exports as AuthErrorCodes,
AuthErrorMessage,
AuthenticationHeaderParser,
AuthenticationScheme,
AzureCloudInstance,
BrowserAuthError,
BrowserAuthErrorCodes_exports as BrowserAuthErrorCodes,
BrowserAuthErrorMessage,
BrowserCacheLocation,
BrowserConfigurationAuthError,
BrowserConfigurationAuthErrorCodes_exports as BrowserConfigurationAuthErrorCodes,
BrowserConfigurationAuthErrorMessage,
BrowserPerformanceClient,
BrowserPerformanceMeasurement,
BrowserUtils_exports as BrowserUtils,
CacheLookupPolicy,
ClientAuthError,
ClientAuthErrorCodes_exports as ClientAuthErrorCodes,
ClientAuthErrorMessage,
ClientConfigurationError,
ClientConfigurationErrorCodes_exports as ClientConfigurationErrorCodes,
ClientConfigurationErrorMessage,
DEFAULT_IFRAME_TIMEOUT_MS,
EventHandler,
EventMessageUtils,
EventType,
InteractionRequiredAuthError,
InteractionRequiredAuthErrorCodes_exports as InteractionRequiredAuthErrorCodes,
InteractionRequiredAuthErrorMessage,
InteractionStatus,
InteractionType,
JsonWebTokenTypes,
LocalStorage,
LogLevel,
Logger,
MemoryStorage,
NavigationClient,
OIDC_DEFAULT_SCOPES,
PerformanceEvents,
PromptValue,
ProtocolMode,
PublicClientApplication,
PublicClientNext,
ServerError,
ServerResponseType,
SessionStorage,
SignedHttpRequest,
StringUtils,
StubPerformanceClient,
UrlString,
WrapperSKU,
createNestablePublicClientApplication,
createStandardPublicClientApplication,
isPlatformBrokerAvailable,
stubbedPublicClientApplication,
version
};
//# sourceMappingURL=@azure_msal-browser.js.map

View file

@ -0,0 +1,7 @@
{
"version": 3,
"sources": [],
"sourcesContent": [],
"mappings": "",
"names": []
}

561
frontend/node_modules/.vite/deps/@azure_msal-react.js generated vendored Normal file
View file

@ -0,0 +1,561 @@
import {
AccountEntity,
AuthError,
EventMessageUtils,
EventType,
InteractionRequiredAuthError,
InteractionStatus,
InteractionType,
Logger,
OIDC_DEFAULT_SCOPES,
WrapperSKU,
stubbedPublicClientApplication
} from "./chunk-DRYC2OA6.js";
import "./chunk-U543NHT4.js";
import {
require_react
} from "./chunk-DRWLMN53.js";
import {
__toESM
} from "./chunk-G3PMV62Z.js";
// node_modules/@azure/msal-react/dist/MsalContext.js
var React = __toESM(require_react(), 1);
var defaultMsalContext = {
instance: stubbedPublicClientApplication,
inProgress: InteractionStatus.None,
accounts: [],
logger: new Logger({})
};
var MsalContext = React.createContext(defaultMsalContext);
var MsalConsumer = MsalContext.Consumer;
// node_modules/@azure/msal-react/dist/MsalProvider.js
var import_react = __toESM(require_react(), 1);
// node_modules/@azure/msal-react/dist/utils/utilities.js
function getChildrenOrFunction(children, args) {
if (typeof children === "function") {
return children(args);
}
return children;
}
function accountArraysAreEqual(arrayA, arrayB) {
if (arrayA.length !== arrayB.length) {
return false;
}
const comparisonArray = [...arrayB];
return arrayA.every((elementA) => {
const elementB = comparisonArray.shift();
if (!elementA || !elementB) {
return false;
}
return elementA.homeAccountId === elementB.homeAccountId && elementA.localAccountId === elementB.localAccountId && elementA.username === elementB.username;
});
}
function getAccountByIdentifiers(allAccounts, accountIdentifiers) {
if (allAccounts.length > 0 && (accountIdentifiers.homeAccountId || accountIdentifiers.localAccountId || accountIdentifiers.username)) {
const matchedAccounts = allAccounts.filter((accountObj) => {
if (accountIdentifiers.username && accountIdentifiers.username.toLowerCase() !== accountObj.username.toLowerCase()) {
return false;
}
if (accountIdentifiers.homeAccountId && accountIdentifiers.homeAccountId.toLowerCase() !== accountObj.homeAccountId.toLowerCase()) {
return false;
}
if (accountIdentifiers.localAccountId && accountIdentifiers.localAccountId.toLowerCase() !== accountObj.localAccountId.toLowerCase()) {
return false;
}
return true;
});
return matchedAccounts[0] || null;
} else {
return null;
}
}
// node_modules/@azure/msal-react/dist/packageMetadata.js
var name = "@azure/msal-react";
var version = "3.0.15";
// node_modules/@azure/msal-react/dist/MsalProvider.js
var MsalProviderActionType = {
UNBLOCK_INPROGRESS: "UNBLOCK_INPROGRESS",
EVENT: "EVENT"
};
var reducer = (previousState, action) => {
const { type, payload } = action;
let newInProgress = previousState.inProgress;
switch (type) {
case MsalProviderActionType.UNBLOCK_INPROGRESS:
if (previousState.inProgress === InteractionStatus.Startup) {
newInProgress = InteractionStatus.None;
payload.logger.info("MsalProvider - handleRedirectPromise resolved, setting inProgress to 'none'");
}
break;
case MsalProviderActionType.EVENT:
const message = payload.message;
const status = EventMessageUtils.getInteractionStatusFromEvent(message, previousState.inProgress);
if (status) {
payload.logger.info(`MsalProvider - ${message.eventType} results in setting inProgress from ${previousState.inProgress} to ${status}`);
newInProgress = status;
}
break;
default:
throw new Error(`Unknown action type: ${type}`);
}
if (newInProgress === InteractionStatus.Startup) {
return previousState;
}
const currentAccounts = payload.instance.getAllAccounts();
if (newInProgress !== previousState.inProgress && !accountArraysAreEqual(currentAccounts, previousState.accounts)) {
return {
...previousState,
inProgress: newInProgress,
accounts: currentAccounts
};
} else if (newInProgress !== previousState.inProgress) {
return {
...previousState,
inProgress: newInProgress
};
} else if (!accountArraysAreEqual(currentAccounts, previousState.accounts)) {
return {
...previousState,
accounts: currentAccounts
};
} else {
return previousState;
}
};
function MsalProvider({ instance, children }) {
(0, import_react.useEffect)(() => {
instance.initializeWrapperLibrary(WrapperSKU.React, version);
}, [instance]);
const logger = (0, import_react.useMemo)(() => {
return instance.getLogger().clone(name, version);
}, [instance]);
const [state, updateState] = (0, import_react.useReducer)(reducer, void 0, () => {
return {
inProgress: InteractionStatus.Startup,
accounts: []
};
});
(0, import_react.useEffect)(() => {
const callbackId = instance.addEventCallback((message) => {
updateState({
payload: {
instance,
logger,
message
},
type: MsalProviderActionType.EVENT
});
});
logger.verbose(`MsalProvider - Registered event callback with id: ${callbackId}`);
instance.initialize().then(() => {
instance.handleRedirectPromise().catch(() => {
return;
}).finally(() => {
updateState({
payload: {
instance,
logger
},
type: MsalProviderActionType.UNBLOCK_INPROGRESS
});
});
}).catch(() => {
return;
});
return () => {
if (callbackId) {
logger.verbose(`MsalProvider - Removing event callback ${callbackId}`);
instance.removeEventCallback(callbackId);
}
};
}, [instance, logger]);
const contextValue = {
instance,
inProgress: state.inProgress,
accounts: state.accounts,
logger
};
return import_react.default.createElement(MsalContext.Provider, { value: contextValue }, children);
}
// node_modules/@azure/msal-react/dist/components/AuthenticatedTemplate.js
var import_react4 = __toESM(require_react(), 1);
// node_modules/@azure/msal-react/dist/hooks/useMsal.js
var import_react2 = __toESM(require_react(), 1);
var useMsal = () => (0, import_react2.useContext)(MsalContext);
// node_modules/@azure/msal-react/dist/hooks/useIsAuthenticated.js
var import_react3 = __toESM(require_react(), 1);
function isAuthenticated(allAccounts, matchAccount) {
if (matchAccount && (matchAccount.username || matchAccount.homeAccountId || matchAccount.localAccountId)) {
return !!getAccountByIdentifiers(allAccounts, matchAccount);
}
return allAccounts.length > 0;
}
function useIsAuthenticated(matchAccount) {
const { accounts: allAccounts, inProgress } = useMsal();
const isUserAuthenticated = (0, import_react3.useMemo)(() => {
if (inProgress === InteractionStatus.Startup) {
return false;
}
return isAuthenticated(allAccounts, matchAccount);
}, [allAccounts, inProgress, matchAccount]);
return isUserAuthenticated;
}
// node_modules/@azure/msal-react/dist/components/AuthenticatedTemplate.js
function AuthenticatedTemplate({ username, homeAccountId, localAccountId, children }) {
const context = useMsal();
const accountIdentifier = (0, import_react4.useMemo)(() => {
return {
username,
homeAccountId,
localAccountId
};
}, [username, homeAccountId, localAccountId]);
const isAuthenticated2 = useIsAuthenticated(accountIdentifier);
if (isAuthenticated2 && context.inProgress !== InteractionStatus.Startup) {
return import_react4.default.createElement(import_react4.default.Fragment, null, getChildrenOrFunction(children, context));
}
return null;
}
// node_modules/@azure/msal-react/dist/components/UnauthenticatedTemplate.js
var import_react5 = __toESM(require_react(), 1);
function UnauthenticatedTemplate({ username, homeAccountId, localAccountId, children }) {
const context = useMsal();
const accountIdentifier = (0, import_react5.useMemo)(() => {
return {
username,
homeAccountId,
localAccountId
};
}, [username, homeAccountId, localAccountId]);
const isAuthenticated2 = useIsAuthenticated(accountIdentifier);
if (!isAuthenticated2 && context.inProgress !== InteractionStatus.Startup && context.inProgress !== InteractionStatus.HandleRedirect) {
return import_react5.default.createElement(import_react5.default.Fragment, null, getChildrenOrFunction(children, context));
}
return null;
}
// node_modules/@azure/msal-react/dist/components/MsalAuthenticationTemplate.js
var import_react8 = __toESM(require_react(), 1);
// node_modules/@azure/msal-react/dist/hooks/useMsalAuthentication.js
var import_react7 = __toESM(require_react(), 1);
// node_modules/@azure/msal-react/dist/hooks/useAccount.js
var import_react6 = __toESM(require_react(), 1);
function getAccount(instance, accountIdentifiers) {
if (!accountIdentifiers || !accountIdentifiers.homeAccountId && !accountIdentifiers.localAccountId && !accountIdentifiers.username) {
return instance.getActiveAccount();
}
return getAccountByIdentifiers(instance.getAllAccounts(), accountIdentifiers);
}
function useAccount(accountIdentifiers) {
const { instance, inProgress, logger } = useMsal();
const [account, setAccount] = (0, import_react6.useState)(() => {
if (inProgress === InteractionStatus.Startup) {
return null;
} else {
return getAccount(instance, accountIdentifiers);
}
});
(0, import_react6.useEffect)(() => {
if (inProgress !== InteractionStatus.Startup) {
setAccount((currentAccount) => {
const nextAccount = getAccount(instance, accountIdentifiers);
if (!AccountEntity.accountInfoIsEqual(currentAccount, nextAccount, true)) {
logger.info("useAccount - Updating account");
return nextAccount;
}
return currentAccount;
});
}
}, [inProgress, accountIdentifiers, instance, logger]);
return account;
}
// node_modules/@azure/msal-react/dist/error/ReactAuthError.js
var ReactAuthErrorMessage = {
invalidInteractionType: {
code: "invalid_interaction_type",
desc: "The provided interaction type is invalid."
},
unableToFallbackToInteraction: {
code: "unable_to_fallback_to_interaction",
desc: "Interaction is required but another interaction is already in progress. Please try again when the current interaction is complete."
}
};
var ReactAuthError = class _ReactAuthError extends AuthError {
constructor(errorCode, errorMessage) {
super(errorCode, errorMessage);
Object.setPrototypeOf(this, _ReactAuthError.prototype);
this.name = "ReactAuthError";
}
static createInvalidInteractionTypeError() {
return new _ReactAuthError(ReactAuthErrorMessage.invalidInteractionType.code, ReactAuthErrorMessage.invalidInteractionType.desc);
}
static createUnableToFallbackToInteractionError() {
return new _ReactAuthError(ReactAuthErrorMessage.unableToFallbackToInteraction.code, ReactAuthErrorMessage.unableToFallbackToInteraction.desc);
}
};
// node_modules/@azure/msal-react/dist/hooks/useMsalAuthentication.js
function useMsalAuthentication(interactionType, authenticationRequest, accountIdentifiers) {
const { instance, inProgress, logger } = useMsal();
const isAuthenticated2 = useIsAuthenticated(accountIdentifiers);
const account = useAccount(accountIdentifiers);
const [[result, error], setResponse] = (0, import_react7.useState)([null, null]);
const mounted = (0, import_react7.useRef)(true);
(0, import_react7.useEffect)(() => {
return () => {
mounted.current = false;
};
}, []);
const interactionInProgress = (0, import_react7.useRef)(inProgress !== InteractionStatus.None);
(0, import_react7.useEffect)(() => {
interactionInProgress.current = inProgress !== InteractionStatus.None;
}, [inProgress]);
const shouldAcquireToken = (0, import_react7.useRef)(true);
(0, import_react7.useEffect)(() => {
if (!!error) {
shouldAcquireToken.current = false;
return;
}
if (!!result) {
shouldAcquireToken.current = false;
return;
}
}, [error, result]);
const login = (0, import_react7.useCallback)(async (callbackInteractionType, callbackRequest) => {
const loginType = callbackInteractionType || interactionType;
const loginRequest = callbackRequest || authenticationRequest;
switch (loginType) {
case InteractionType.Popup:
logger.verbose("useMsalAuthentication - Calling loginPopup");
return instance.loginPopup(loginRequest);
case InteractionType.Redirect:
logger.verbose("useMsalAuthentication - Calling loginRedirect");
return instance.loginRedirect(loginRequest).then(null);
case InteractionType.Silent:
logger.verbose("useMsalAuthentication - Calling ssoSilent");
return instance.ssoSilent(loginRequest);
default:
throw ReactAuthError.createInvalidInteractionTypeError();
}
}, [instance, interactionType, authenticationRequest, logger]);
const acquireToken = (0, import_react7.useCallback)(async (callbackInteractionType, callbackRequest) => {
const fallbackInteractionType = callbackInteractionType || interactionType;
let tokenRequest;
if (callbackRequest) {
logger.trace("useMsalAuthentication - acquireToken - Using request provided in the callback");
tokenRequest = {
...callbackRequest
};
} else if (authenticationRequest) {
logger.trace("useMsalAuthentication - acquireToken - Using request provided in the hook");
tokenRequest = {
...authenticationRequest,
scopes: authenticationRequest.scopes || OIDC_DEFAULT_SCOPES
};
} else {
logger.trace("useMsalAuthentication - acquireToken - No request object provided, using default request.");
tokenRequest = {
scopes: OIDC_DEFAULT_SCOPES
};
}
if (!tokenRequest.account && account) {
logger.trace("useMsalAuthentication - acquireToken - Attaching account to request");
tokenRequest.account = account;
}
const getToken = async () => {
logger.verbose("useMsalAuthentication - Calling acquireTokenSilent");
return instance.acquireTokenSilent(tokenRequest).catch(async (e) => {
if (e instanceof InteractionRequiredAuthError) {
if (!interactionInProgress.current) {
logger.error("useMsalAuthentication - Interaction required, falling back to interaction");
return login(fallbackInteractionType, tokenRequest);
} else {
logger.error("useMsalAuthentication - Interaction required but is already in progress. Please try again, if needed, after interaction completes.");
throw ReactAuthError.createUnableToFallbackToInteractionError();
}
}
throw e;
});
};
return getToken().then((response) => {
if (mounted.current) {
setResponse([response, null]);
}
return response;
}).catch((e) => {
if (mounted.current) {
setResponse([null, e]);
}
throw e;
});
}, [
instance,
interactionType,
authenticationRequest,
logger,
account,
login
]);
(0, import_react7.useEffect)(() => {
const callbackId = instance.addEventCallback((message) => {
switch (message.eventType) {
case EventType.LOGIN_SUCCESS:
case EventType.SSO_SILENT_SUCCESS:
if (message.payload) {
setResponse([
message.payload,
null
]);
}
break;
case EventType.LOGIN_FAILURE:
case EventType.SSO_SILENT_FAILURE:
if (message.error) {
setResponse([null, message.error]);
}
break;
}
});
logger.verbose(`useMsalAuthentication - Registered event callback with id: ${callbackId}`);
return () => {
if (callbackId) {
logger.verbose(`useMsalAuthentication - Removing event callback ${callbackId}`);
instance.removeEventCallback(callbackId);
}
};
}, [instance, logger]);
(0, import_react7.useEffect)(() => {
if (shouldAcquireToken.current && inProgress === InteractionStatus.None) {
if (!isAuthenticated2) {
shouldAcquireToken.current = false;
logger.info("useMsalAuthentication - No user is authenticated, attempting to login");
login().catch(() => {
return;
});
} else if (account) {
shouldAcquireToken.current = false;
logger.info("useMsalAuthentication - User is authenticated, attempting to acquire token");
acquireToken().catch(() => {
return;
});
}
}
}, [isAuthenticated2, account, inProgress, login, acquireToken, logger]);
return {
login,
acquireToken,
result,
error
};
}
// node_modules/@azure/msal-react/dist/components/MsalAuthenticationTemplate.js
function MsalAuthenticationTemplate({ interactionType, username, homeAccountId, localAccountId, authenticationRequest, loadingComponent: LoadingComponent, errorComponent: ErrorComponent, children }) {
const accountIdentifier = (0, import_react8.useMemo)(() => {
return {
username,
homeAccountId,
localAccountId
};
}, [username, homeAccountId, localAccountId]);
const context = useMsal();
const msalAuthResult = useMsalAuthentication(interactionType, authenticationRequest, accountIdentifier);
const isAuthenticated2 = useIsAuthenticated(accountIdentifier);
if (msalAuthResult.error && context.inProgress === InteractionStatus.None) {
if (!!ErrorComponent) {
return import_react8.default.createElement(ErrorComponent, { ...msalAuthResult });
}
throw msalAuthResult.error;
}
if (isAuthenticated2) {
return import_react8.default.createElement(import_react8.default.Fragment, null, getChildrenOrFunction(children, msalAuthResult));
}
if (!!LoadingComponent && context.inProgress !== InteractionStatus.None) {
return import_react8.default.createElement(LoadingComponent, { ...context });
}
return null;
}
// node_modules/@azure/msal-react/dist/components/withMsal.js
var import_react9 = __toESM(require_react(), 1);
var withMsal = (Component) => {
const ComponentWithMsal = (props) => {
const msal = useMsal();
return import_react9.default.createElement(Component, { ...props, msalContext: msal });
};
const componentName = Component.displayName || Component.name || "Component";
ComponentWithMsal.displayName = `withMsal(${componentName})`;
return ComponentWithMsal;
};
export {
AuthenticatedTemplate,
MsalAuthenticationTemplate,
MsalConsumer,
MsalContext,
MsalProvider,
UnauthenticatedTemplate,
useAccount,
useIsAuthenticated,
useMsal,
useMsalAuthentication,
version,
withMsal
};
/*! Bundled license information:
@azure/msal-react/dist/MsalContext.js:
(*! @azure/msal-react v3.0.15 2025-07-08 *)
@azure/msal-react/dist/utils/utilities.js:
(*! @azure/msal-react v3.0.15 2025-07-08 *)
@azure/msal-react/dist/packageMetadata.js:
(*! @azure/msal-react v3.0.15 2025-07-08 *)
@azure/msal-react/dist/MsalProvider.js:
(*! @azure/msal-react v3.0.15 2025-07-08 *)
@azure/msal-react/dist/hooks/useMsal.js:
(*! @azure/msal-react v3.0.15 2025-07-08 *)
@azure/msal-react/dist/hooks/useIsAuthenticated.js:
(*! @azure/msal-react v3.0.15 2025-07-08 *)
@azure/msal-react/dist/components/AuthenticatedTemplate.js:
(*! @azure/msal-react v3.0.15 2025-07-08 *)
@azure/msal-react/dist/components/UnauthenticatedTemplate.js:
(*! @azure/msal-react v3.0.15 2025-07-08 *)
@azure/msal-react/dist/hooks/useAccount.js:
(*! @azure/msal-react v3.0.15 2025-07-08 *)
@azure/msal-react/dist/error/ReactAuthError.js:
(*! @azure/msal-react v3.0.15 2025-07-08 *)
@azure/msal-react/dist/hooks/useMsalAuthentication.js:
(*! @azure/msal-react v3.0.15 2025-07-08 *)
@azure/msal-react/dist/components/MsalAuthenticationTemplate.js:
(*! @azure/msal-react v3.0.15 2025-07-08 *)
@azure/msal-react/dist/components/withMsal.js:
(*! @azure/msal-react v3.0.15 2025-07-08 *)
@azure/msal-react/dist/index.js:
(*! @azure/msal-react v3.0.15 2025-07-08 *)
*/
//# sourceMappingURL=@azure_msal-react.js.map

File diff suppressed because one or more lines are too long

View file

@ -0,0 +1,8 @@
import {
BrowserPerformanceMeasurement
} from "./chunk-U543NHT4.js";
import "./chunk-G3PMV62Z.js";
export {
BrowserPerformanceMeasurement
};
//# sourceMappingURL=BrowserPerformanceMeasurement-DAXTNWC7.js.map

View file

@ -0,0 +1,7 @@
{
"version": 3,
"sources": [],
"sourcesContent": [],
"mappings": "",
"names": []
}

118
frontend/node_modules/.vite/deps/_metadata.json generated vendored Normal file
View file

@ -0,0 +1,118 @@
{
"hash": "42012998",
"configHash": "2337c4bc",
"lockfileHash": "2912a9cc",
"browserHash": "4c751707",
"optimized": {
"react": {
"src": "../../react/index.js",
"file": "react.js",
"fileHash": "f754a42b",
"needsInterop": true
},
"react-dom": {
"src": "../../react-dom/index.js",
"file": "react-dom.js",
"fileHash": "93d3be4a",
"needsInterop": true
},
"react/jsx-dev-runtime": {
"src": "../../react/jsx-dev-runtime.js",
"file": "react_jsx-dev-runtime.js",
"fileHash": "58e54cfa",
"needsInterop": true
},
"react/jsx-runtime": {
"src": "../../react/jsx-runtime.js",
"file": "react_jsx-runtime.js",
"fileHash": "452eb51a",
"needsInterop": true
},
"@azure/msal-browser": {
"src": "../../@azure/msal-browser/dist/index.mjs",
"file": "@azure_msal-browser.js",
"fileHash": "af1eb922",
"needsInterop": false
},
"@azure/msal-react": {
"src": "../../@azure/msal-react/dist/index.js",
"file": "@azure_msal-react.js",
"fileHash": "78fcd178",
"needsInterop": false
},
"axios": {
"src": "../../axios/index.js",
"file": "axios.js",
"fileHash": "a41d57c8",
"needsInterop": false
},
"lucide-react": {
"src": "../../lucide-react/dist/esm/lucide-react.js",
"file": "lucide-react.js",
"fileHash": "ab000d47",
"needsInterop": false
},
"react-dom/client": {
"src": "../../react-dom/client.js",
"file": "react-dom_client.js",
"fileHash": "977e8266",
"needsInterop": true
},
"react-dropzone": {
"src": "../../react-dropzone/dist/es/index.js",
"file": "react-dropzone.js",
"fileHash": "c6fc4427",
"needsInterop": false
},
"react-hook-form": {
"src": "../../react-hook-form/dist/index.esm.mjs",
"file": "react-hook-form.js",
"fileHash": "76d14c40",
"needsInterop": false
},
"react-hot-toast": {
"src": "../../react-hot-toast/dist/index.mjs",
"file": "react-hot-toast.js",
"fileHash": "4cb655f1",
"needsInterop": false
},
"react-query": {
"src": "../../react-query/es/index.js",
"file": "react-query.js",
"fileHash": "72a776d0",
"needsInterop": false
},
"react-router-dom": {
"src": "../../react-router-dom/dist/index.js",
"file": "react-router-dom.js",
"fileHash": "1b99dcc7",
"needsInterop": false
},
"showdown": {
"src": "../../showdown/dist/showdown.js",
"file": "showdown.js",
"fileHash": "d5940d71",
"needsInterop": true
}
},
"chunks": {
"BrowserPerformanceMeasurement-DAXTNWC7": {
"file": "BrowserPerformanceMeasurement-DAXTNWC7.js"
},
"chunk-PJEEZAML": {
"file": "chunk-PJEEZAML.js"
},
"chunk-DRYC2OA6": {
"file": "chunk-DRYC2OA6.js"
},
"chunk-U543NHT4": {
"file": "chunk-U543NHT4.js"
},
"chunk-DRWLMN53": {
"file": "chunk-DRWLMN53.js"
},
"chunk-G3PMV62Z": {
"file": "chunk-G3PMV62Z.js"
}
}
}

2523
frontend/node_modules/.vite/deps/axios.js generated vendored Normal file

File diff suppressed because it is too large Load diff

Some files were not shown because too many files have changed in this diff Show more