initial commit
This commit is contained in:
commit
82be78c7ae
10829 changed files with 1270486 additions and 0 deletions
74
.gitignore
vendored
Normal file
74
.gitignore
vendored
Normal file
|
|
@ -0,0 +1,74 @@
|
|||
# Environment files - NEVER commit these
|
||||
.env
|
||||
.env.local
|
||||
.env.production
|
||||
|
||||
# Virtual Environment
|
||||
llama-index/
|
||||
|
||||
# Data and uploads (contains sensitive documents)
|
||||
data/
|
||||
indices/
|
||||
|
||||
# venv
|
||||
venv/
|
||||
|
||||
# Logs (may contain sensitive information)
|
||||
*.log
|
||||
*.txt
|
||||
|
||||
# Python cache
|
||||
__pycache__/
|
||||
*.pyc
|
||||
*.pyo
|
||||
*.pyd
|
||||
.Python
|
||||
build/
|
||||
develop-eggs/
|
||||
dist/
|
||||
downloads/
|
||||
eggs/
|
||||
.eggs/
|
||||
lib64/
|
||||
parts/
|
||||
sdist/
|
||||
var/
|
||||
wheels/
|
||||
*.egg-info/
|
||||
.installed.cfg
|
||||
*.egg
|
||||
|
||||
# OS files
|
||||
.DS_Store
|
||||
.DS_Store?
|
||||
._*
|
||||
.Spotlight-V100
|
||||
.Trashes
|
||||
ehthumbs.db
|
||||
Thumbs.db
|
||||
|
||||
# IDE files
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
|
||||
# Database files
|
||||
*.sqlite
|
||||
*.sqlite3
|
||||
*.db
|
||||
|
||||
# Temporary files
|
||||
*.tmp
|
||||
*.temp
|
||||
*.backup
|
||||
*.bak
|
||||
|
||||
# API keys or sensitive config (backup safety)
|
||||
config.local.php
|
||||
secrets.php
|
||||
|
||||
# Error logs
|
||||
error_log
|
||||
error.log
|
||||
618
API_TESTING_GUIDE.md
Normal file
618
API_TESTING_GUIDE.md
Normal file
|
|
@ -0,0 +1,618 @@
|
|||
# Contract Analysis Tool - API Testing Guide
|
||||
|
||||
This guide provides comprehensive step-by-step instructions to test all APIs in the Contract Analysis Tool with real input examples.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. **Backend Server Running**: `http://localhost:8000`
|
||||
2. **Frontend Server Running**: `http://localhost:3000`
|
||||
3. **MongoDB Running**: `localhost:27017`
|
||||
4. **Redis Running**: `localhost:6379`
|
||||
5. **Environment Variables**: Ensure `.env` files are properly configured
|
||||
|
||||
## Authentication Setup
|
||||
|
||||
### Step 1: Initialize Default Users
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/v1/auth/init-users
|
||||
```
|
||||
|
||||
**Expected Response:**
|
||||
```json
|
||||
{
|
||||
"message": "Default users created successfully",
|
||||
"admin_email": "admin@oliver.agency",
|
||||
"user_email": "user@oliver.agency"
|
||||
}
|
||||
```
|
||||
|
||||
### Step 2: Test Health Check
|
||||
```bash
|
||||
curl http://localhost:8000/health
|
||||
```
|
||||
|
||||
**Expected Response:**
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"version": "2.0.0"
|
||||
}
|
||||
```
|
||||
|
||||
## Authentication APIs
|
||||
|
||||
### 1. Admin Login
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/v1/auth/login \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"email": "admin@oliver.agency",
|
||||
"password": "admin123"
|
||||
}'
|
||||
```
|
||||
|
||||
**Expected Response:**
|
||||
```json
|
||||
{
|
||||
"access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
|
||||
"token_type": "bearer",
|
||||
"user": {
|
||||
"id": "...",
|
||||
"email": "admin@oliver.agency",
|
||||
"role": "admin",
|
||||
"is_active": true,
|
||||
"index_access": [...]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Save the admin token:**
|
||||
```bash
|
||||
ADMIN_TOKEN="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
|
||||
```
|
||||
|
||||
### 2. User Login
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/v1/auth/login \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"email": "user@oliver.agency",
|
||||
"password": "user123"
|
||||
}'
|
||||
```
|
||||
|
||||
**Save the user token:**
|
||||
```bash
|
||||
USER_TOKEN="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
|
||||
```
|
||||
|
||||
### 3. Get Current User Info
|
||||
```bash
|
||||
curl -X GET http://localhost:8000/api/v1/auth/me \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN"
|
||||
```
|
||||
|
||||
## Admin APIs
|
||||
|
||||
### User Management
|
||||
|
||||
#### 1. Get All Users
|
||||
```bash
|
||||
curl -X GET http://localhost:8000/api/v1/admin/users \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN"
|
||||
```
|
||||
|
||||
#### 2. Create New User
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/v1/admin/users \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"email": "testuser@example.com",
|
||||
"password": "SecurePass123",
|
||||
"role": "user",
|
||||
"is_active": true
|
||||
}'
|
||||
```
|
||||
|
||||
**Save the user ID from response:**
|
||||
```bash
|
||||
NEW_USER_ID="686ec44005b0398525fde787"
|
||||
```
|
||||
|
||||
#### 3. Update User
|
||||
```bash
|
||||
curl -X PUT http://localhost:8000/api/v1/admin/users/$NEW_USER_ID \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"email": "updated_testuser@example.com",
|
||||
"role": "admin",
|
||||
"is_active": false
|
||||
}'
|
||||
```
|
||||
|
||||
#### 4. Delete User
|
||||
```bash
|
||||
curl -X DELETE http://localhost:8000/api/v1/admin/users/$NEW_USER_ID \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN"
|
||||
```
|
||||
|
||||
### Index Management
|
||||
|
||||
#### 1. Get All Indices
|
||||
```bash
|
||||
curl -X GET http://localhost:8000/api/v1/admin/indices \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN"
|
||||
```
|
||||
|
||||
#### 2. Create New Index
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/v1/admin/indices/create \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN" \
|
||||
-F "name=Test Contract Index" \
|
||||
-F "description=Index for testing contract analysis" \
|
||||
-F "chunk_size=1000" \
|
||||
-F "chunk_overlap=200"
|
||||
```
|
||||
|
||||
**Save the index ID from response:**
|
||||
```bash
|
||||
TEST_INDEX_ID="test-index-2025-07-09-abc123"
|
||||
```
|
||||
|
||||
#### 3. Delete Index
|
||||
```bash
|
||||
curl -X DELETE http://localhost:8000/api/v1/admin/indices/$TEST_INDEX_ID \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN"
|
||||
```
|
||||
|
||||
### Document Management
|
||||
|
||||
#### 1. Upload Single Document
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/v1/admin/documents/upload-single \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN" \
|
||||
-F "file=@/path/to/your/document.pdf" \
|
||||
-F "index_id=$TEST_INDEX_ID" \
|
||||
-F "custom_name=Test Contract Document"
|
||||
```
|
||||
|
||||
#### 2. Upload Multiple Documents
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/v1/admin/documents/upload-multiple \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN" \
|
||||
-F "files=@/path/to/document1.pdf" \
|
||||
-F "files=@/path/to/document2.pdf" \
|
||||
-F "index_id=$TEST_INDEX_ID" \
|
||||
-F "base_name=Contract Batch"
|
||||
```
|
||||
|
||||
#### 3. Get Index Documents
|
||||
```bash
|
||||
curl -X GET http://localhost:8000/api/v1/admin/documents/$TEST_INDEX_ID \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN"
|
||||
```
|
||||
|
||||
**Save a document ID from response:**
|
||||
```bash
|
||||
DOCUMENT_ID="686ebfa705b0398525fde785"
|
||||
```
|
||||
|
||||
#### 4. Reprocess Document
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/v1/admin/documents/$DOCUMENT_ID/reprocess \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN"
|
||||
```
|
||||
|
||||
#### 5. Delete Document
|
||||
```bash
|
||||
curl -X DELETE http://localhost:8000/api/v1/admin/documents/$DOCUMENT_ID \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN"
|
||||
```
|
||||
|
||||
### Index Access Management
|
||||
|
||||
#### 1. Grant Index Access to User
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/v1/admin/users/$USER_ID/grant-index-access \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"index_id": "'$TEST_INDEX_ID'"
|
||||
}'
|
||||
```
|
||||
|
||||
#### 2. Revoke Index Access from User
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/v1/admin/users/$USER_ID/revoke-index-access \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"index_id": "'$TEST_INDEX_ID'"
|
||||
}'
|
||||
```
|
||||
|
||||
#### 3. Grant All Indices Access
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/v1/admin/grant-all-indices/$USER_ID \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN"
|
||||
```
|
||||
|
||||
### System Monitoring
|
||||
|
||||
#### 1. Get System Statistics
|
||||
```bash
|
||||
curl -X GET http://localhost:8000/api/v1/admin/stats \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN"
|
||||
```
|
||||
|
||||
#### 2. Get Processing Status
|
||||
```bash
|
||||
curl -X GET http://localhost:8000/api/v1/admin/documents/processing-status \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN"
|
||||
```
|
||||
|
||||
#### 3. Process Pending Documents
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/v1/admin/documents/process-pending \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN"
|
||||
```
|
||||
|
||||
### Admin RAG Query
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/v1/admin/chat/query \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN" \
|
||||
-F "query=What are the key terms of this contract?" \
|
||||
-F "index_id=$TEST_INDEX_ID" \
|
||||
-F "top_k=5"
|
||||
```
|
||||
|
||||
## User APIs
|
||||
|
||||
### Index Access
|
||||
|
||||
#### 1. Get User's Indices
|
||||
```bash
|
||||
curl -X GET http://localhost:8000/api/v1/indices/ \
|
||||
-H "Authorization: Bearer $USER_TOKEN"
|
||||
```
|
||||
|
||||
#### 2. Get Specific Index
|
||||
```bash
|
||||
curl -X GET http://localhost:8000/api/v1/indices/$TEST_INDEX_ID \
|
||||
-H "Authorization: Bearer $USER_TOKEN"
|
||||
```
|
||||
|
||||
#### 3. Create User Index
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/v1/indices/create \
|
||||
-H "Authorization: Bearer $USER_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "My Contract Index",
|
||||
"description": "Personal contract analysis index"
|
||||
}'
|
||||
```
|
||||
|
||||
### Document Operations
|
||||
|
||||
#### 1. Upload Document to Index
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/v1/documents/upload \
|
||||
-H "Authorization: Bearer $USER_TOKEN" \
|
||||
-F "file=@/path/to/contract.pdf" \
|
||||
-F "index_id=$USER_INDEX_ID"
|
||||
```
|
||||
|
||||
#### 2. Get Documents by Index
|
||||
```bash
|
||||
curl -X GET http://localhost:8000/api/v1/documents/index/$USER_INDEX_ID \
|
||||
-H "Authorization: Bearer $USER_TOKEN"
|
||||
```
|
||||
|
||||
#### 3. Get Document Details
|
||||
```bash
|
||||
curl -X GET http://localhost:8000/api/v1/documents/$DOCUMENT_ID \
|
||||
-H "Authorization: Bearer $USER_TOKEN"
|
||||
```
|
||||
|
||||
#### 4. Get Document Summary (AI)
|
||||
```bash
|
||||
curl -X GET http://localhost:8000/api/v1/documents/$DOCUMENT_ID/summary \
|
||||
-H "Authorization: Bearer $USER_TOKEN"
|
||||
```
|
||||
|
||||
**Expected Response:**
|
||||
```json
|
||||
{
|
||||
"document_id": "686ebfa705b0398525fde785",
|
||||
"filename": "Contract Document.pdf",
|
||||
"summary": "This document is a service agreement between...",
|
||||
"processing_status": "completed",
|
||||
"generated_at": "2025-07-09T19:41:12.301719"
|
||||
}
|
||||
```
|
||||
|
||||
#### 5. Download Document
|
||||
```bash
|
||||
curl -X GET http://localhost:8000/api/v1/documents/$DOCUMENT_ID/download \
|
||||
-H "Authorization: Bearer $USER_TOKEN" \
|
||||
--output downloaded_document.pdf
|
||||
```
|
||||
|
||||
#### 6. Delete Document
|
||||
```bash
|
||||
curl -X DELETE http://localhost:8000/api/v1/documents/$DOCUMENT_ID \
|
||||
-H "Authorization: Bearer $USER_TOKEN"
|
||||
```
|
||||
|
||||
### Chat and RAG
|
||||
|
||||
#### 1. Query Documents (RAG)
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/v1/chat/query \
|
||||
-H "Authorization: Bearer $USER_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"query": "What are the payment terms in this contract?",
|
||||
"index_id": "'$USER_INDEX_ID'",
|
||||
"top_k": 5
|
||||
}'
|
||||
```
|
||||
|
||||
**Expected Response:**
|
||||
```json
|
||||
{
|
||||
"response": "Based on the contract analysis, the payment terms include...",
|
||||
"cached": false,
|
||||
"response_time": 2.34,
|
||||
"debug_info": {
|
||||
"sources": [...],
|
||||
"context_used": true,
|
||||
"context_messages_count": 3
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### 2. Get Chat History
|
||||
```bash
|
||||
curl -X GET http://localhost:8000/api/v1/chat/history/$USER_INDEX_ID \
|
||||
-H "Authorization: Bearer $USER_TOKEN"
|
||||
```
|
||||
|
||||
#### 3. Clear Chat History
|
||||
```bash
|
||||
curl -X DELETE http://localhost:8000/api/v1/chat/history/$USER_INDEX_ID \
|
||||
-H "Authorization: Bearer $USER_TOKEN"
|
||||
```
|
||||
|
||||
#### 4. Get Index Chat Status
|
||||
```bash
|
||||
curl -X GET http://localhost:8000/api/v1/chat/status/$USER_INDEX_ID \
|
||||
-H "Authorization: Bearer $USER_TOKEN"
|
||||
```
|
||||
|
||||
## Permission Testing
|
||||
|
||||
### Test User Access Restrictions
|
||||
|
||||
#### 1. User Trying to Access Admin Endpoint (Should Fail)
|
||||
```bash
|
||||
curl -X GET http://localhost:8000/api/v1/admin/users \
|
||||
-H "Authorization: Bearer $USER_TOKEN"
|
||||
```
|
||||
|
||||
**Expected Response:**
|
||||
```json
|
||||
{
|
||||
"detail": "Not enough permissions"
|
||||
}
|
||||
```
|
||||
|
||||
#### 2. User Accessing Unauthorized Index (Should Fail)
|
||||
```bash
|
||||
curl -X GET http://localhost:8000/api/v1/documents/index/unauthorized-index-id \
|
||||
-H "Authorization: Bearer $USER_TOKEN"
|
||||
```
|
||||
|
||||
#### 3. Admin Accessing Any Resource (Should Work)
|
||||
```bash
|
||||
curl -X GET http://localhost:8000/api/v1/documents/index/$ANY_INDEX_ID \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN"
|
||||
```
|
||||
|
||||
## Frontend Testing
|
||||
|
||||
### 1. Access Frontend
|
||||
```bash
|
||||
# Open browser to
|
||||
http://localhost:3000
|
||||
```
|
||||
|
||||
### 2. Test Login Flow
|
||||
1. Navigate to `http://localhost:3000/login`
|
||||
2. Login with admin credentials: `admin@oliver.agency` / `admin123`
|
||||
3. Verify dashboard access
|
||||
4. Check admin panel visibility in sidebar
|
||||
|
||||
### 3. Test Admin Panel
|
||||
1. Navigate to `http://localhost:3000/dashboard/admin`
|
||||
2. Test user management (create, edit, delete)
|
||||
3. Test index access management
|
||||
4. Verify system statistics
|
||||
|
||||
### 4. Test User Features
|
||||
1. Login as user: `user@oliver.agency` / `user123`
|
||||
2. Test document upload and management
|
||||
3. Test AI summary generation
|
||||
4. Test chat interface with documents
|
||||
|
||||
## Error Testing
|
||||
|
||||
### 1. Invalid Authentication
|
||||
```bash
|
||||
curl -X GET http://localhost:8000/api/v1/admin/users \
|
||||
-H "Authorization: Bearer invalid_token"
|
||||
```
|
||||
|
||||
### 2. Missing Required Fields
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/v1/admin/users \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"email": "incomplete@user.com"
|
||||
}'
|
||||
```
|
||||
|
||||
### 3. Duplicate Email
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/v1/admin/users \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"email": "admin@oliver.agency",
|
||||
"password": "test123",
|
||||
"role": "user"
|
||||
}'
|
||||
```
|
||||
|
||||
## Performance Testing
|
||||
|
||||
### 1. Large File Upload
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/v1/documents/upload \
|
||||
-H "Authorization: Bearer $USER_TOKEN" \
|
||||
-F "file=@/path/to/large_document.pdf" \
|
||||
-F "index_id=$USER_INDEX_ID" \
|
||||
--max-time 300
|
||||
```
|
||||
|
||||
### 2. Concurrent Chat Queries
|
||||
```bash
|
||||
# Run multiple queries simultaneously
|
||||
for i in {1..5}; do
|
||||
curl -X POST http://localhost:8000/api/v1/chat/query \
|
||||
-H "Authorization: Bearer $USER_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"query": "Query '$i': What is the contract about?",
|
||||
"index_id": "'$USER_INDEX_ID'"
|
||||
}' &
|
||||
done
|
||||
wait
|
||||
```
|
||||
|
||||
## Complete Test Workflow
|
||||
|
||||
### 1. Setup Test Environment
|
||||
```bash
|
||||
# Start services
|
||||
cd /path/to/backend && uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 &
|
||||
cd /path/to/frontend && npm run dev &
|
||||
|
||||
# Initialize users
|
||||
curl -X POST http://localhost:8000/api/v1/auth/init-users
|
||||
```
|
||||
|
||||
### 2. Login and Get Tokens
|
||||
```bash
|
||||
ADMIN_TOKEN=$(curl -X POST http://localhost:8000/api/v1/auth/login \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"email": "admin@oliver.agency", "password": "admin123"}' \
|
||||
| jq -r '.access_token')
|
||||
|
||||
USER_TOKEN=$(curl -X POST http://localhost:8000/api/v1/auth/login \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"email": "user@oliver.agency", "password": "user123"}' \
|
||||
| jq -r '.access_token')
|
||||
```
|
||||
|
||||
### 3. Create Test Index
|
||||
```bash
|
||||
INDEX_RESPONSE=$(curl -X POST http://localhost:8000/api/v1/admin/indices/create \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN" \
|
||||
-F "name=API Test Index" \
|
||||
-F "description=Index for API testing")
|
||||
|
||||
TEST_INDEX_ID=$(echo $INDEX_RESPONSE | jq -r '.index_id')
|
||||
```
|
||||
|
||||
### 4. Upload and Process Document
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/v1/admin/documents/upload-single \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN" \
|
||||
-F "file=@/path/to/test_contract.pdf" \
|
||||
-F "index_id=$TEST_INDEX_ID" \
|
||||
-F "custom_name=Test Contract"
|
||||
```
|
||||
|
||||
### 5. Grant User Access
|
||||
```bash
|
||||
USER_ID=$(curl -X GET http://localhost:8000/api/v1/admin/users \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN" \
|
||||
| jq -r '.[] | select(.email=="user@oliver.agency") | .id')
|
||||
|
||||
curl -X POST http://localhost:8000/api/v1/admin/users/$USER_ID/grant-index-access \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"index_id": "'$TEST_INDEX_ID'"}'
|
||||
```
|
||||
|
||||
### 6. Test User Chat
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/v1/chat/query \
|
||||
-H "Authorization: Bearer $USER_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"query": "Summarize this contract",
|
||||
"index_id": "'$TEST_INDEX_ID'"
|
||||
}'
|
||||
```
|
||||
|
||||
### 7. Test AI Summary
|
||||
```bash
|
||||
DOCUMENT_ID=$(curl -X GET http://localhost:8000/api/v1/admin/documents/$TEST_INDEX_ID \
|
||||
-H "Authorization: Bearer $ADMIN_TOKEN" \
|
||||
| jq -r '.documents[0].id')
|
||||
|
||||
curl -X GET http://localhost:8000/api/v1/documents/$DOCUMENT_ID/summary \
|
||||
-H "Authorization: Bearer $USER_TOKEN"
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **401 Unauthorized**: Check token validity and format
|
||||
2. **403 Forbidden**: Verify user has required permissions
|
||||
3. **404 Not Found**: Ensure resource exists and user has access
|
||||
4. **422 Validation Error**: Check request body format and required fields
|
||||
5. **500 Internal Server Error**: Check backend logs and OpenAI API key
|
||||
|
||||
### Debug Commands
|
||||
|
||||
```bash
|
||||
# Check backend logs
|
||||
tail -f backend.log
|
||||
|
||||
# Check database connection
|
||||
curl http://localhost:8000/health
|
||||
|
||||
# Verify token
|
||||
curl -X GET http://localhost:8000/api/v1/auth/me \
|
||||
-H "Authorization: Bearer $TOKEN"
|
||||
|
||||
# Check OpenAI API key
|
||||
echo $OPENAI_API_KEY
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- Replace `/path/to/your/document.pdf` with actual file paths
|
||||
- Replace placeholder IDs with actual IDs from responses
|
||||
- Ensure all environment variables are properly set
|
||||
- All timestamps are in UTC format
|
||||
- File uploads support PDF, DOCX, DOC, TXT, CSV, JSON, HTML, MD, RTF formats
|
||||
- Maximum file size is 50MB (configurable)
|
||||
297
CLAUDE.md
Normal file
297
CLAUDE.md
Normal file
|
|
@ -0,0 +1,297 @@
|
|||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Project Overview
|
||||
|
||||
This is a modern Contract Analysis Tool v2.0 - a production-ready Retrieval-Augmented Generation (RAG) application for intelligent contract analysis and document Q&A. The system consists of a FastAPI backend and React frontend.
|
||||
|
||||
## Architecture
|
||||
|
||||
**Stack:**
|
||||
- **Backend:** FastAPI + MongoDB + Redis + ChromaDB
|
||||
- **Frontend:** React + Vite + Tailwind CSS
|
||||
- **AI/ML:** OpenAI GPT-4, LlamaIndex, ChromaDB for vector storage
|
||||
- **Authentication:** JWT-based with role-based access control
|
||||
|
||||
**Data Flow:**
|
||||
```
|
||||
React Frontend → FastAPI Backend → MongoDB + ChromaDB → OpenAI API
|
||||
↓
|
||||
Redis Cache
|
||||
```
|
||||
|
||||
## Development Commands
|
||||
|
||||
### Backend (FastAPI)
|
||||
|
||||
**Start development server:**
|
||||
```bash
|
||||
cd backend
|
||||
source venv/bin/activate # On Windows: venv\Scripts\activate
|
||||
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
|
||||
```
|
||||
|
||||
**Install dependencies:**
|
||||
```bash
|
||||
cd backend
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
**Database setup:**
|
||||
- MongoDB runs on port 27017
|
||||
- Redis runs on port 6379
|
||||
- Application auto-creates collections/indexes on startup
|
||||
|
||||
**Initialize default users:**
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/v1/auth/init-users
|
||||
```
|
||||
|
||||
**Health check:**
|
||||
```bash
|
||||
curl http://localhost:8000/health
|
||||
```
|
||||
|
||||
### Frontend (React)
|
||||
|
||||
**Start development server:**
|
||||
```bash
|
||||
cd frontend
|
||||
npm run dev
|
||||
```
|
||||
|
||||
**Build for production:**
|
||||
```bash
|
||||
cd frontend
|
||||
npm run build
|
||||
```
|
||||
|
||||
**Lint code:**
|
||||
```bash
|
||||
cd frontend
|
||||
npm run lint
|
||||
```
|
||||
|
||||
**Install dependencies:**
|
||||
```bash
|
||||
cd frontend
|
||||
npm install
|
||||
```
|
||||
|
||||
### Docker Development
|
||||
|
||||
**Start all services:**
|
||||
```bash
|
||||
cd backend
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
**Backend only (with external DB):**
|
||||
```bash
|
||||
cd backend
|
||||
docker-compose up -d mongo redis
|
||||
```
|
||||
|
||||
## Project Structure
|
||||
|
||||
### Backend (`/backend`)
|
||||
- `app/main.py` - FastAPI application entry point
|
||||
- `app/config/settings.py` - Environment configuration and database settings
|
||||
- `app/api/v1/` - API endpoints (auth, documents, indices, chat, admin)
|
||||
- `app/models/` - MongoDB data models (user, document, index, chat)
|
||||
- `app/services/` - Business logic (document_processor, rag_service)
|
||||
- `app/core/` - Core utilities (auth, security, cache)
|
||||
- `app/utils/` - Helper utilities (file_utils)
|
||||
|
||||
### Frontend (`/frontend`)
|
||||
- `src/App.jsx` - Main React application with routing
|
||||
- `src/pages/` - Page components (Dashboard, DocumentManager, ChatInterface, AdminPanel)
|
||||
- `src/components/` - Reusable UI components organized by feature
|
||||
- `src/services/` - API service layer (authService, documentService, chatService, indexService)
|
||||
- `src/context/` - React context providers (AuthContext)
|
||||
- `src/utils/` - Frontend utilities and constants
|
||||
|
||||
## Key Features & Workflows
|
||||
|
||||
### Authentication System
|
||||
- JWT-based authentication with role-based access (admin/user)
|
||||
- Default users: `admin@oliver.agency`/`admin123`, `user@oliver.agency`/`user123`
|
||||
- Protected routes with automatic token refresh
|
||||
|
||||
### Document Processing Pipeline
|
||||
1. **Upload** → Document uploaded via React frontend
|
||||
2. **Process** → Backend processes with LlamaIndex (PDF parsing, chunking)
|
||||
3. **Index** → Embeddings stored in ChromaDB, metadata in MongoDB
|
||||
4. **Query** → Natural language queries via RAG system
|
||||
|
||||
### Index Management
|
||||
- Users can create document indices for organizing documents
|
||||
- Role-based access control for index management
|
||||
- ChromaDB handles vector storage, MongoDB stores metadata
|
||||
|
||||
### Chat System
|
||||
- **Context-Aware Conversations**: AI remembers previous 10 messages within 24-hour window
|
||||
- **Real-time document Q&A** using RAG with source citations
|
||||
- **Proper message ordering** - chronological display with correct timestamps
|
||||
- **Conversation continuity** - responses reference previous context when relevant
|
||||
- **Configurable top-k** results for query precision (3, 5, 10, 15)
|
||||
- **Smart caching** - context-dependent responses aren't cached, simple queries are
|
||||
- **Session statistics** - track response times, cache hit rates, message counts
|
||||
|
||||
## Environment Configuration
|
||||
|
||||
### Backend (`.env`)
|
||||
```env
|
||||
# Database
|
||||
MONGODB_URL=mongodb://localhost:27017
|
||||
DATABASE_NAME=contract_analysis
|
||||
|
||||
# Redis
|
||||
REDIS_URL=redis://localhost:6379
|
||||
|
||||
# Authentication
|
||||
JWT_SECRET_KEY=your-super-secret-jwt-key
|
||||
JWT_ALGORITHM=HS256
|
||||
JWT_EXPIRE_MINUTES=30
|
||||
|
||||
# OpenAI
|
||||
OPENAI_API_KEY=your-openai-api-key
|
||||
LLAMAPARSE_API_KEY=your-llamaparse-api-key
|
||||
|
||||
# Application
|
||||
DEBUG=false
|
||||
CORS_ORIGINS=["http://localhost:3000"]
|
||||
UPLOAD_DIR=./uploads
|
||||
INDICES_DIR=./indices
|
||||
|
||||
# Cache
|
||||
CACHE_ENABLED=true
|
||||
CACHE_TTL=3600
|
||||
```
|
||||
|
||||
### Frontend (`.env`)
|
||||
```env
|
||||
VITE_API_URL=http://localhost:8000
|
||||
VITE_APP_NAME=Contract Analysis Tool
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
**Authentication:**
|
||||
- `POST /api/v1/auth/login` - User login
|
||||
- `POST /api/v1/auth/init-users` - Initialize default users
|
||||
|
||||
**Documents:**
|
||||
- `POST /api/v1/documents/upload` - Upload documents to index
|
||||
- `GET /api/v1/documents/{index_id}` - List documents in index
|
||||
|
||||
**Indices:**
|
||||
- `POST /api/v1/indices/create` - Create new document index
|
||||
- `GET /api/v1/indices/` - List user's indices
|
||||
|
||||
**Chat:**
|
||||
- `POST /api/v1/chat/query` - Query documents with natural language
|
||||
|
||||
**Admin:**
|
||||
- `GET /api/v1/admin/stats` - System statistics (admin only)
|
||||
- `POST /api/v1/admin/documents/upload-single` - Upload single document
|
||||
- `POST /api/v1/admin/documents/upload-multiple` - Upload multiple documents
|
||||
- `GET /api/v1/admin/documents/{index_id}` - Get index documents
|
||||
- `POST /api/v1/admin/documents/{document_id}/reprocess` - Reprocess document
|
||||
- `DELETE /api/v1/admin/documents/{document_id}` - Delete document
|
||||
- `GET /api/v1/admin/indices` - Get all indices
|
||||
- `POST /api/v1/admin/indices/create` - Create new index
|
||||
- `POST /api/v1/admin/chat/query` - RAG query interface
|
||||
|
||||
## Development Notes
|
||||
|
||||
### Database Connections
|
||||
- MongoDB connection pooling handled automatically
|
||||
- Redis connection with fallback if unavailable
|
||||
- ChromaDB indices stored in `./indices` directory
|
||||
|
||||
### File Handling
|
||||
- Uploads stored in `./uploads/{index_id}/` directory structure
|
||||
- Supported formats: PDF, DOCX, DOC, TXT, CSV, JSON, HTML, MD, RTF
|
||||
- 50MB file size limit (configurable)
|
||||
- Automatic file naming for batch uploads
|
||||
|
||||
### Caching Strategy
|
||||
- Redis caches API responses for performance
|
||||
- TTL configurable via `CACHE_TTL` environment variable
|
||||
- Cache keys include user context for security
|
||||
|
||||
### Document Processing
|
||||
- Async processing with database status tracking
|
||||
- Processing states: pending → processing → completed/failed
|
||||
- Embedding states: pending → processing → completed/failed
|
||||
- Automatic retry capability for failed documents
|
||||
- Chunk count and vector ID tracking in MongoDB
|
||||
|
||||
### Vector Storage
|
||||
- ChromaDB persistent storage in `./indices/chroma_db/`
|
||||
- Collections named `index_{index_id}` for organization
|
||||
- Metadata includes document_id, chunk_index, index_id
|
||||
- Configurable similarity search with top-k results
|
||||
|
||||
### Chat Context System
|
||||
- **Context Window**: 24-hour rolling window with max 10 previous messages
|
||||
- **Smart Context**: AI uses conversation history for continuity and follow-up questions
|
||||
- **Context Caching**: Responses with context aren't cached (dynamic), simple queries are cached
|
||||
- **Database Storage**: All messages stored with proper timestamps and context metadata
|
||||
- **Context Display**: Frontend shows when context is used and how many previous messages
|
||||
- **Session Management**: Track conversation statistics and context usage
|
||||
|
||||
### Message Ordering & Timestamps
|
||||
- **Chronological Order**: Messages displayed in proper time sequence (oldest → newest)
|
||||
- **Accurate Timestamps**: Server-side timestamp generation with UTC storage
|
||||
- **Separate Timestamps**: User and assistant messages have distinct timestamps
|
||||
- **Proper Database Storage**: `created_at`, `user_timestamp`, and `assistant_timestamp` fields
|
||||
- **Frontend Display**: Localized timestamp formatting with date and time
|
||||
- **Context Indicators**: Visual indicators show when AI used previous conversation context
|
||||
|
||||
### Error Handling & Validation
|
||||
- **Collection Validation**: Check ChromaDB collection exists before querying
|
||||
- **Document Status Check**: Verify documents are fully processed before chat
|
||||
- **Graceful Degradation**: Fallback responses when context generation fails
|
||||
- **User-Friendly Errors**: Clear, actionable error messages with next steps
|
||||
- **Progress Tracking**: Real-time status updates during document processing
|
||||
|
||||
### Progress Visualization
|
||||
- **Upload Progress**: Real-time progress bars during file uploads
|
||||
- **Processing Status**: Visual indicators for document processing stages
|
||||
- **Embedding Progress**: Separate progress tracking for text processing and embedding
|
||||
- **Success States**: Clear visual feedback when operations complete
|
||||
- **Status Dashboard**: Comprehensive view of document processing pipeline
|
||||
|
||||
### Security Features
|
||||
- JWT token validation on protected routes
|
||||
- Input validation with Pydantic schemas
|
||||
- CORS configuration for frontend integration
|
||||
- File upload validation and sanitization
|
||||
|
||||
## Testing
|
||||
|
||||
### Backend Testing
|
||||
```bash
|
||||
cd backend
|
||||
# API documentation available at http://localhost:8000/docs
|
||||
# Manual testing via Swagger UI
|
||||
```
|
||||
|
||||
### Frontend Testing
|
||||
- React components use modern hooks patterns
|
||||
- Error boundaries for graceful error handling
|
||||
- Loading states for better UX
|
||||
|
||||
## Migration Context
|
||||
|
||||
This is a migrated application from PHP/Python to FastAPI/React. The migration maintained:
|
||||
- Complete feature parity with the original application
|
||||
- All document processing capabilities
|
||||
- ChromaDB indices compatibility
|
||||
- Enhanced performance and security
|
||||
- Modern, responsive UI
|
||||
|
||||
The `MIGRATION_PLAN.md` file contains detailed information about the migration process and architecture decisions.
|
||||
326
MIGRATION_PLAN.md
Normal file
326
MIGRATION_PLAN.md
Normal file
|
|
@ -0,0 +1,326 @@
|
|||
# Migration Plan: PHP/Python → FastAPI/React
|
||||
|
||||
## Overview
|
||||
This document outlines the complete migration strategy for transforming the current PHP/Python hybrid RAG application into a modern FastAPI backend with React frontend architecture.
|
||||
|
||||
## Current Application Analysis
|
||||
|
||||
### Existing Features
|
||||
- **User Authentication**: Role-based access (admin/user) with SQLite storage
|
||||
- **Document Management**: File uploads, processing, and indexing
|
||||
- **RAG System**: LlamaIndex + ChromaDB for document retrieval
|
||||
- **Contract Analysis**: GPT-4 powered contract field extraction
|
||||
- **Chat Interface**: Natural language document Q&A
|
||||
- **Caching System**: Response caching for performance
|
||||
- **Index Management**: User-specific access control to document indices
|
||||
|
||||
### Current Architecture
|
||||
```
|
||||
PHP Frontend (Web UI) → Python Backend (Processing) → OpenAI API
|
||||
↓ ↓
|
||||
SQLite DB ChromaDB Vectors
|
||||
```
|
||||
|
||||
## New Architecture
|
||||
|
||||
### Target Architecture
|
||||
```
|
||||
React Frontend → FastAPI Backend → MongoDB + ChromaDB → OpenAI API
|
||||
↓
|
||||
Redis Cache
|
||||
```
|
||||
|
||||
## Project Structure
|
||||
|
||||
### Backend (FastAPI)
|
||||
```
|
||||
backend/
|
||||
├── app/
|
||||
│ ├── __init__.py
|
||||
│ ├── main.py # FastAPI app entry point
|
||||
│ ├── config/
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── settings.py # Environment configuration
|
||||
│ │ └── database.py # MongoDB connection
|
||||
│ ├── models/
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── user.py # User data models
|
||||
│ │ ├── document.py # Document models
|
||||
│ │ ├── index.py # Index models
|
||||
│ │ └── chat.py # Chat/query models
|
||||
│ ├── schemas/
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── user.py # Pydantic schemas
|
||||
│ │ ├── document.py
|
||||
│ │ ├── index.py
|
||||
│ │ └── chat.py
|
||||
│ ├── api/
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── deps.py # Dependencies
|
||||
│ │ └── v1/
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── auth.py # Authentication endpoints
|
||||
│ │ ├── documents.py # Document management
|
||||
│ │ ├── indices.py # Index management
|
||||
│ │ ├── chat.py # Chat/query endpoints
|
||||
│ │ └── admin.py # Admin endpoints
|
||||
│ ├── core/
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── auth.py # JWT authentication
|
||||
│ │ ├── security.py # Security utilities
|
||||
│ │ └── cache.py # Redis caching
|
||||
│ ├── services/
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── document_processor.py # Document processing service
|
||||
│ │ ├── rag_service.py # RAG retrieval service
|
||||
│ │ ├── index_service.py # Index management service
|
||||
│ │ └── openai_service.py # OpenAI integration
|
||||
│ ├── utils/
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── file_utils.py # File handling utilities
|
||||
│ │ └── llama_utils.py # LlamaIndex utilities
|
||||
│ └── middleware/
|
||||
│ ├── __init__.py
|
||||
│ ├── cors.py # CORS middleware
|
||||
│ └── logging.py # Request logging
|
||||
├── requirements.txt
|
||||
├── Dockerfile
|
||||
├── docker-compose.yml
|
||||
└── .env.example
|
||||
```
|
||||
|
||||
### Frontend (React)
|
||||
```
|
||||
frontend/
|
||||
├── public/
|
||||
│ ├── index.html
|
||||
│ └── favicon.ico
|
||||
├── src/
|
||||
│ ├── components/
|
||||
│ │ ├── common/
|
||||
│ │ │ ├── Header.jsx
|
||||
│ │ │ ├── Sidebar.jsx
|
||||
│ │ │ ├── Layout.jsx
|
||||
│ │ │ └── LoadingSpinner.jsx
|
||||
│ │ ├── auth/
|
||||
│ │ │ ├── LoginForm.jsx
|
||||
│ │ │ └── ProtectedRoute.jsx
|
||||
│ │ ├── documents/
|
||||
│ │ │ ├── DocumentUpload.jsx
|
||||
│ │ │ ├── DocumentList.jsx
|
||||
│ │ │ └── DocumentViewer.jsx
|
||||
│ │ ├── chat/
|
||||
│ │ │ ├── ChatInterface.jsx
|
||||
│ │ │ ├── MessageList.jsx
|
||||
│ │ │ └── MessageInput.jsx
|
||||
│ │ ├── indices/
|
||||
│ │ │ ├── IndexList.jsx
|
||||
│ │ │ ├── IndexManager.jsx
|
||||
│ │ │ └── CreateIndex.jsx
|
||||
│ │ └── admin/
|
||||
│ │ ├── UserManagement.jsx
|
||||
│ │ └── SystemMonitor.jsx
|
||||
│ ├── hooks/
|
||||
│ │ ├── useAuth.js
|
||||
│ │ ├── useDocuments.js
|
||||
│ │ ├── useChat.js
|
||||
│ │ └── useIndices.js
|
||||
│ ├── services/
|
||||
│ │ ├── api.js # Axios configuration
|
||||
│ │ ├── authService.js # Authentication API calls
|
||||
│ │ ├── documentService.js # Document API calls
|
||||
│ │ ├── chatService.js # Chat API calls
|
||||
│ │ └── indexService.js # Index API calls
|
||||
│ ├── context/
|
||||
│ │ ├── AuthContext.js
|
||||
│ │ └── ThemeContext.js
|
||||
│ ├── utils/
|
||||
│ │ ├── constants.js
|
||||
│ │ ├── helpers.js
|
||||
│ │ └── validation.js
|
||||
│ ├── styles/
|
||||
│ │ ├── globals.css
|
||||
│ │ └── components/
|
||||
│ ├── App.jsx
|
||||
│ ├── index.js
|
||||
│ └── routes.js
|
||||
├── package.json
|
||||
├── tailwind.config.js
|
||||
├── vite.config.js
|
||||
└── .env.example
|
||||
```
|
||||
|
||||
## Technology Stack
|
||||
|
||||
### Backend
|
||||
- **FastAPI**: Modern, fast web framework for Python APIs
|
||||
- **MongoDB**: Document database for user data, metadata
|
||||
- **ChromaDB**: Vector database for document embeddings (kept from current)
|
||||
- **Redis**: Caching layer for improved performance
|
||||
- **Pydantic**: Data validation and serialization
|
||||
- **JWT**: Token-based authentication
|
||||
- **LlamaIndex**: RAG framework (kept from current)
|
||||
- **OpenAI**: GPT-4 for analysis and embeddings
|
||||
|
||||
### Frontend
|
||||
- **React 18**: Modern React with hooks
|
||||
- **Vite**: Fast build tool and dev server
|
||||
- **Tailwind CSS**: Utility-first CSS framework
|
||||
- **Axios**: HTTP client for API calls
|
||||
- **React Router**: Client-side routing
|
||||
- **React Hook Form**: Form handling
|
||||
- **Zustand**: State management
|
||||
- **React Query**: Server state management
|
||||
|
||||
## Migration Strategy
|
||||
|
||||
### Phase 1: Backend Foundation
|
||||
1. Set up FastAPI project structure
|
||||
2. Configure MongoDB connection
|
||||
3. Implement user authentication with JWT
|
||||
4. Create data models and schemas
|
||||
5. Set up Redis caching
|
||||
|
||||
### Phase 2: Core Services
|
||||
1. Port document processing pipeline
|
||||
2. Implement RAG service with LlamaIndex
|
||||
3. Create OpenAI integration service
|
||||
4. Implement index management
|
||||
5. Set up file upload handling
|
||||
|
||||
### Phase 3: API Endpoints
|
||||
1. Authentication endpoints
|
||||
2. Document management endpoints
|
||||
3. Chat/query endpoints
|
||||
4. Index management endpoints
|
||||
5. Admin endpoints
|
||||
|
||||
### Phase 4: Frontend Development
|
||||
1. Set up React project with Vite
|
||||
2. Create authentication flow
|
||||
3. Build document management interface
|
||||
4. Implement chat interface
|
||||
5. Create admin dashboard
|
||||
|
||||
### Phase 5: Integration & Testing
|
||||
1. Connect frontend to backend APIs
|
||||
2. Implement proper error handling
|
||||
3. Add loading states and UX improvements
|
||||
4. Performance optimization
|
||||
5. Security hardening
|
||||
|
||||
### Phase 6: Deployment
|
||||
1. Docker containerization
|
||||
2. Environment configuration
|
||||
3. Production deployment setup
|
||||
4. Monitoring and logging
|
||||
|
||||
## Data Migration
|
||||
|
||||
### User Data
|
||||
- Migrate from SQLite to MongoDB
|
||||
- Transform user authentication to JWT
|
||||
- Preserve user roles and permissions
|
||||
|
||||
### Document Indices
|
||||
- Keep existing ChromaDB indices
|
||||
- Update index metadata in MongoDB
|
||||
- Maintain document access permissions
|
||||
|
||||
### Configuration
|
||||
- Environment variables migration
|
||||
- API key management
|
||||
- Cache configuration
|
||||
|
||||
## Key Improvements
|
||||
|
||||
### Performance
|
||||
- Async/await throughout backend
|
||||
- Redis caching for API responses
|
||||
- Optimized database queries
|
||||
- React Query for client-side caching
|
||||
|
||||
### Security
|
||||
- JWT-based authentication
|
||||
- Input validation with Pydantic
|
||||
- CORS configuration
|
||||
- Rate limiting
|
||||
|
||||
### Scalability
|
||||
- Microservice-ready architecture
|
||||
- Database connection pooling
|
||||
- Horizontal scaling support
|
||||
- Load balancing ready
|
||||
|
||||
### Developer Experience
|
||||
- Type hints throughout Python code
|
||||
- API documentation with FastAPI
|
||||
- Modern React patterns
|
||||
- Hot reloading in development
|
||||
|
||||
### User Experience
|
||||
- Modern, responsive UI
|
||||
- Real-time updates
|
||||
- Better error handling
|
||||
- Improved performance
|
||||
|
||||
## Implementation Timeline
|
||||
|
||||
1. **Week 1**: Backend foundation and authentication
|
||||
2. **Week 2**: Core services and API endpoints
|
||||
3. **Week 3**: Frontend setup and basic components
|
||||
4. **Week 4**: Integration and testing
|
||||
5. **Week 5**: Deployment and optimization
|
||||
|
||||
## File Deletion Strategy
|
||||
|
||||
Files will be deleted progressively as new implementations are completed:
|
||||
|
||||
1. **Phase 1**: Remove PHP authentication files after JWT implementation
|
||||
2. **Phase 2**: Remove PHP API files after FastAPI endpoints
|
||||
3. **Phase 3**: Remove Python processing scripts after service implementation
|
||||
4. **Phase 4**: Remove remaining PHP files after frontend completion
|
||||
5. **Phase 5**: Clean up temporary files and documentation
|
||||
|
||||
## Environment Configuration
|
||||
|
||||
### Backend (.env)
|
||||
```
|
||||
# Database
|
||||
MONGODB_URL=mongodb://localhost:27017
|
||||
DATABASE_NAME=contract_analysis
|
||||
|
||||
# Redis
|
||||
REDIS_URL=redis://localhost:6379
|
||||
|
||||
# Authentication
|
||||
JWT_SECRET_KEY=your-secret-key
|
||||
JWT_ALGORITHM=HS256
|
||||
JWT_EXPIRE_MINUTES=30
|
||||
|
||||
# OpenAI
|
||||
OPENAI_API_KEY=your-openai-key
|
||||
LLAMAPARSE_API_KEY=your-llamaparse-key
|
||||
|
||||
# Application
|
||||
DEBUG=false
|
||||
CORS_ORIGINS=["http://localhost:3000"]
|
||||
```
|
||||
|
||||
### Frontend (.env)
|
||||
```
|
||||
VITE_API_URL=http://localhost:8000
|
||||
VITE_APP_NAME=Contract Analysis Tool
|
||||
```
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- [ ] Complete feature parity with current application
|
||||
- [ ] Improved performance (faster load times, better caching)
|
||||
- [ ] Modern, responsive UI
|
||||
- [ ] Scalable architecture
|
||||
- [ ] Comprehensive API documentation
|
||||
- [ ] Security improvements
|
||||
- [ ] Easy deployment and maintenance
|
||||
|
||||
This migration will modernize the application while maintaining all existing functionality and improving performance, security, and maintainability.
|
||||
349
README.md
Normal file
349
README.md
Normal file
|
|
@ -0,0 +1,349 @@
|
|||
# Contract Analysis Tool v2.0
|
||||
|
||||
A modern, production-ready Retrieval-Augmented Generation (RAG) application for intelligent contract analysis and document Q&A. Built with FastAPI backend and React frontend.
|
||||
|
||||

|
||||

|
||||

|
||||

|
||||
|
||||
## 🚀 Features
|
||||
|
||||
- **Modern Architecture**: FastAPI + React + MongoDB + Redis
|
||||
- **AI-Powered Analysis**: GPT-4 integration for contract analysis
|
||||
- **Document Q&A**: Natural language queries with RAG
|
||||
- **User Management**: Role-based access control
|
||||
- **Real-time Processing**: Async document processing
|
||||
- **Intelligent Caching**: Redis-based response caching
|
||||
- **Scalable Design**: Microservice-ready architecture
|
||||
|
||||
## 🏗️ Architecture
|
||||
|
||||
```
|
||||
React Frontend → FastAPI Backend → MongoDB + ChromaDB → OpenAI API
|
||||
↓
|
||||
Redis Cache
|
||||
```
|
||||
|
||||
## 📋 Prerequisites
|
||||
|
||||
- **Python 3.11+**
|
||||
- **Node.js 18+**
|
||||
- **MongoDB 7+**
|
||||
- **Redis 7+**
|
||||
- **OpenAI API Key**
|
||||
- **LlamaParse API Key** (optional)
|
||||
|
||||
## 🛠️ Installation
|
||||
|
||||
### Option 1: Docker (Recommended)
|
||||
|
||||
1. **Clone the repository**
|
||||
```bash
|
||||
git clone <repository-url>
|
||||
cd llama-contracts-master
|
||||
```
|
||||
|
||||
2. **Set up environment variables**
|
||||
```bash
|
||||
# Backend
|
||||
cp backend/.env.example backend/.env
|
||||
# Edit backend/.env with your API keys
|
||||
|
||||
# Frontend
|
||||
cp frontend/.env.example frontend/.env
|
||||
```
|
||||
|
||||
3. **Start with Docker Compose**
|
||||
```bash
|
||||
cd backend
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
4. **Start the frontend**
|
||||
```bash
|
||||
cd frontend
|
||||
npm install
|
||||
npm run dev
|
||||
```
|
||||
|
||||
### Option 2: Manual Setup
|
||||
|
||||
#### Backend Setup
|
||||
|
||||
1. **Create Python virtual environment**
|
||||
```bash
|
||||
cd backend
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate # On Windows: venv\Scripts\activate
|
||||
```
|
||||
|
||||
2. **Install dependencies**
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
3. **Set up environment variables**
|
||||
```bash
|
||||
cp .env.example .env
|
||||
# Edit .env with your configuration
|
||||
```
|
||||
|
||||
4. **Start MongoDB and Redis**
|
||||
```bash
|
||||
# MongoDB
|
||||
brew services start mongodb/brew/mongodb-community
|
||||
|
||||
# Redis
|
||||
brew services start redis
|
||||
```
|
||||
|
||||
5. **Start the backend**
|
||||
```bash
|
||||
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
|
||||
```
|
||||
|
||||
#### Frontend Setup
|
||||
|
||||
1. **Install dependencies**
|
||||
```bash
|
||||
cd frontend
|
||||
npm install
|
||||
```
|
||||
|
||||
2. **Set up environment variables**
|
||||
```bash
|
||||
cp .env.example .env
|
||||
```
|
||||
|
||||
3. **Start the development server**
|
||||
```bash
|
||||
npm run dev
|
||||
```
|
||||
|
||||
## 🔧 Configuration
|
||||
|
||||
### Backend Environment Variables
|
||||
|
||||
```env
|
||||
# Database
|
||||
MONGODB_URL=mongodb://localhost:27017
|
||||
DATABASE_NAME=contract_analysis
|
||||
|
||||
# Redis
|
||||
REDIS_URL=redis://localhost:6379
|
||||
|
||||
# Authentication
|
||||
JWT_SECRET_KEY=your-super-secret-jwt-key
|
||||
JWT_ALGORITHM=HS256
|
||||
JWT_EXPIRE_MINUTES=30
|
||||
|
||||
# OpenAI
|
||||
OPENAI_API_KEY=your-openai-api-key
|
||||
LLAMAPARSE_API_KEY=your-llamaparse-api-key
|
||||
|
||||
# Application
|
||||
DEBUG=false
|
||||
CORS_ORIGINS=["http://localhost:3000"]
|
||||
UPLOAD_DIR=./uploads
|
||||
INDICES_DIR=./indices
|
||||
|
||||
# Cache
|
||||
CACHE_ENABLED=true
|
||||
CACHE_TTL=3600
|
||||
```
|
||||
|
||||
### Frontend Environment Variables
|
||||
|
||||
```env
|
||||
VITE_API_URL=http://localhost:8000
|
||||
VITE_APP_NAME=Contract Analysis Tool
|
||||
```
|
||||
|
||||
## 🚀 Usage
|
||||
|
||||
### Swagger Testing
|
||||
Method 1: Get Token via Swagger UI (Easiest)
|
||||
|
||||
1. Go to http://localhost:8000/docs
|
||||
2. First, initialize the default users by calling
|
||||
the /api/v1/auth/init-users endpoint
|
||||
3. Then use the /api/v1/auth/login endpoint with
|
||||
these credentials:
|
||||
|
||||
Admin User:
|
||||
- email: admin@oliver.agency
|
||||
- password: admin123
|
||||
|
||||
Regular User:
|
||||
- email: user@oliver.agency
|
||||
- password: user123
|
||||
|
||||
4. Copy the access_token from the response
|
||||
5. Click the "Authorize" button at the top of
|
||||
Swagger UI
|
||||
6. Enter: Bearer YOUR_TOKEN_HERE
|
||||
|
||||
### Initial Setup
|
||||
|
||||
1. **Access the application**: http://localhost:3000
|
||||
2. **Initialize default users** (first time only):
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/api/v1/auth/init-users
|
||||
```
|
||||
|
||||
### Default Credentials
|
||||
|
||||
- **Admin**: `admin@oliver.agency` / `admin123`
|
||||
- **User**: `user@oliver.agency` / `user123`
|
||||
|
||||
### Workflow
|
||||
|
||||
1. **Login** with admin or user credentials
|
||||
2. **Create an Index** for your document collection
|
||||
3. **Upload Documents** to the index
|
||||
4. **Chat** with your documents using natural language
|
||||
5. **Manage Users** (admin only)
|
||||
|
||||
## 📚 API Documentation
|
||||
|
||||
- **FastAPI Docs**: http://localhost:8000/docs (development only)
|
||||
- **ReDoc**: http://localhost:8000/redoc (development only)
|
||||
|
||||
### Key Endpoints
|
||||
|
||||
- `POST /api/v1/auth/login` - User authentication
|
||||
- `POST /api/v1/indices/create` - Create document index
|
||||
- `POST /api/v1/documents/upload` - Upload documents
|
||||
- `POST /api/v1/chat/query` - Query documents
|
||||
- `GET /api/v1/admin/stats` - System statistics
|
||||
|
||||
## 🔒 Security Features
|
||||
|
||||
- **JWT Authentication** with role-based access
|
||||
- **Input Validation** with Pydantic schemas
|
||||
- **CORS Protection** for frontend integration
|
||||
- **File Upload Validation** with type/size checks
|
||||
- **Rate Limiting** (configurable)
|
||||
- **Environment Variable Protection**
|
||||
|
||||
## ⚡ Performance Features
|
||||
|
||||
- **Async Processing** throughout the backend
|
||||
- **Redis Caching** for API responses
|
||||
- **Vector Search** with ChromaDB
|
||||
- **Connection Pooling** for databases
|
||||
- **Optimized Queries** with MongoDB indexes
|
||||
|
||||
## 🧪 Development
|
||||
|
||||
### Backend Development
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
source venv/bin/activate
|
||||
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
|
||||
```
|
||||
|
||||
### Frontend Development
|
||||
|
||||
```bash
|
||||
cd frontend
|
||||
npm run dev
|
||||
```
|
||||
|
||||
### Database Migration
|
||||
|
||||
The application automatically creates database collections and indexes on startup.
|
||||
|
||||
## 📊 Monitoring
|
||||
|
||||
### Health Check
|
||||
|
||||
```bash
|
||||
curl http://localhost:8000/health
|
||||
```
|
||||
|
||||
### System Stats (Admin)
|
||||
|
||||
```bash
|
||||
curl -H "Authorization: Bearer <token>" http://localhost:8000/api/v1/admin/stats
|
||||
```
|
||||
|
||||
## 🐛 Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **MongoDB Connection Failed**
|
||||
- Ensure MongoDB is running: `brew services start mongodb-community`
|
||||
- Check connection string in `.env`
|
||||
|
||||
2. **Redis Connection Failed**
|
||||
- Ensure Redis is running: `brew services start redis`
|
||||
- Application will continue without caching if Redis is unavailable
|
||||
|
||||
3. **OpenAI API Errors**
|
||||
- Verify API key in backend `.env`
|
||||
- Check API quota and billing
|
||||
|
||||
4. **File Upload Issues**
|
||||
- Check file size limits (50MB default)
|
||||
- Verify file types are supported
|
||||
- Ensure upload directory permissions
|
||||
|
||||
### Logs
|
||||
|
||||
- **Backend logs**: Console output from uvicorn
|
||||
- **Frontend logs**: Browser console
|
||||
- **Database logs**: MongoDB logs in data directory
|
||||
|
||||
## 🔄 Migration from v1.0
|
||||
|
||||
The new system provides complete feature parity with the original PHP application:
|
||||
|
||||
- ✅ All PHP functionality migrated to FastAPI
|
||||
- ✅ SQLite data can be migrated to MongoDB
|
||||
- ✅ Existing ChromaDB indices are compatible
|
||||
- ✅ All document processing features preserved
|
||||
- ✅ Enhanced performance and security
|
||||
|
||||
## 🚀 Deployment
|
||||
|
||||
### Production Deployment
|
||||
|
||||
1. **Set production environment variables**
|
||||
2. **Use production database URLs**
|
||||
3. **Enable HTTPS with SSL certificates**
|
||||
4. **Configure reverse proxy (Nginx)**
|
||||
5. **Set up monitoring and logging**
|
||||
6. **Regular backups of MongoDB**
|
||||
|
||||
### Docker Production
|
||||
|
||||
```bash
|
||||
docker-compose -f docker-compose.prod.yml up -d
|
||||
```
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
1. Fork the repository
|
||||
2. Create a feature branch
|
||||
3. Make your changes
|
||||
4. Add tests if applicable
|
||||
5. Submit a pull request
|
||||
|
||||
## 📄 License
|
||||
|
||||
This project is licensed under the MIT License.
|
||||
|
||||
## 🙏 Acknowledgments
|
||||
|
||||
- **OpenAI** - GPT-4 and embedding models
|
||||
- **LlamaIndex** - RAG framework
|
||||
- **ChromaDB** - Vector storage
|
||||
- **FastAPI** - Modern Python web framework
|
||||
- **React** - Frontend framework
|
||||
|
||||
---
|
||||
|
||||
**Built with ❤️ for intelligent contract analysis**
|
||||
25
backend/.env.example
Normal file
25
backend/.env.example
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
# Database
|
||||
MONGODB_URL=mongodb://localhost:27017
|
||||
DATABASE_NAME=contract_analysis
|
||||
|
||||
# Redis
|
||||
REDIS_URL=redis://localhost:6379
|
||||
|
||||
# Authentication
|
||||
JWT_SECRET_KEY=your-super-secret-jwt-key-change-this-in-production
|
||||
JWT_ALGORITHM=HS256
|
||||
JWT_EXPIRE_MINUTES=30
|
||||
|
||||
# OpenAI
|
||||
OPENAI_API_KEY=your-openai-api-key-here
|
||||
LLAMAPARSE_API_KEY=your-llamaparse-api-key-here
|
||||
|
||||
# Application
|
||||
DEBUG=false
|
||||
CORS_ORIGINS=["http://localhost:3000"]
|
||||
UPLOAD_DIR=./uploads
|
||||
INDICES_DIR=./indices
|
||||
|
||||
# Cache
|
||||
CACHE_ENABLED=true
|
||||
CACHE_TTL=3600
|
||||
25
backend/Dockerfile
Normal file
25
backend/Dockerfile
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
FROM python:3.11-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Install system dependencies
|
||||
RUN apt-get update && apt-get install -y \
|
||||
gcc \
|
||||
g++ \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Copy requirements and install Python dependencies
|
||||
COPY requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
# Copy application code
|
||||
COPY . .
|
||||
|
||||
# Create directories
|
||||
RUN mkdir -p uploads indices
|
||||
|
||||
# Expose port
|
||||
EXPOSE 8000
|
||||
|
||||
# Command to run the application
|
||||
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
|
||||
1
backend/app/__init__.py
Normal file
1
backend/app/__init__.py
Normal file
|
|
@ -0,0 +1 @@
|
|||
# FastAPI Backend for Contract Analysis Tool
|
||||
1
backend/app/api/__init__.py
Normal file
1
backend/app/api/__init__.py
Normal file
|
|
@ -0,0 +1 @@
|
|||
# API package
|
||||
1
backend/app/api/v1/__init__.py
Normal file
1
backend/app/api/v1/__init__.py
Normal file
|
|
@ -0,0 +1 @@
|
|||
# API v1 package
|
||||
699
backend/app/api/v1/admin.py
Normal file
699
backend/app/api/v1/admin.py
Normal file
|
|
@ -0,0 +1,699 @@
|
|||
from fastapi import APIRouter, Depends, HTTPException, UploadFile, File, Form
|
||||
from motor.motor_asyncio import AsyncIOMotorDatabase
|
||||
from typing import List, Optional
|
||||
from bson import ObjectId
|
||||
from datetime import datetime
|
||||
from urllib.parse import unquote
|
||||
|
||||
from ...config.database import get_database
|
||||
from ...config.settings import settings
|
||||
from ...core.auth import get_current_admin_user
|
||||
from ...models.user import UserInDB, UserUpdate, UserRole
|
||||
from ...models.document import DocumentInDB
|
||||
from ...models.index import IndexInDB
|
||||
from ...services.llama_processor import llama_processor
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
@router.get("/users", response_model=List[dict])
|
||||
async def get_all_users(
|
||||
admin_user: UserInDB = Depends(get_current_admin_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Get all users (admin only)"""
|
||||
users = []
|
||||
cursor = db.users.find({})
|
||||
|
||||
async for user in cursor:
|
||||
user_obj = UserInDB(**user)
|
||||
users.append({
|
||||
"id": str(user_obj.id),
|
||||
"email": user_obj.email,
|
||||
"role": user_obj.role,
|
||||
"is_active": user_obj.is_active,
|
||||
"index_access": user_obj.index_access,
|
||||
"created_at": user_obj.created_at,
|
||||
"updated_at": user_obj.updated_at
|
||||
})
|
||||
|
||||
return users
|
||||
|
||||
@router.post("/users")
|
||||
async def create_user(
|
||||
user_data: dict,
|
||||
admin_user: UserInDB = Depends(get_current_admin_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Create a new user (admin only)"""
|
||||
try:
|
||||
from ...core.security import get_password_hash
|
||||
|
||||
# Check if email already exists
|
||||
existing_user = await db.users.find_one({"email": user_data["email"]})
|
||||
if existing_user:
|
||||
raise HTTPException(status_code=400, detail="Email already registered")
|
||||
|
||||
# Hash the password
|
||||
hashed_password = get_password_hash(user_data["password"])
|
||||
|
||||
# Create user document
|
||||
new_user = {
|
||||
"email": user_data["email"],
|
||||
"hashed_password": hashed_password,
|
||||
"role": user_data.get("role", "user"),
|
||||
"is_active": user_data.get("is_active", True),
|
||||
"index_access": [],
|
||||
"created_at": datetime.utcnow(),
|
||||
"updated_at": datetime.utcnow()
|
||||
}
|
||||
|
||||
# Insert user
|
||||
result = await db.users.insert_one(new_user)
|
||||
|
||||
return {
|
||||
"message": "User created successfully",
|
||||
"user_id": str(result.inserted_id),
|
||||
"email": user_data["email"]
|
||||
}
|
||||
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
raise HTTPException(status_code=500, detail=f"Error creating user: {str(e)}")
|
||||
|
||||
@router.put("/users/{user_id}")
|
||||
async def update_user(
|
||||
user_id: str,
|
||||
user_update: UserUpdate,
|
||||
admin_user: UserInDB = Depends(get_current_admin_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Update a user (admin only)"""
|
||||
# Check if user exists
|
||||
user = await db.users.find_one({"_id": ObjectId(user_id)})
|
||||
if not user:
|
||||
raise HTTPException(status_code=404, detail="User not found")
|
||||
|
||||
# Prepare update data
|
||||
update_data = {}
|
||||
if user_update.email is not None:
|
||||
# Check if email is already taken
|
||||
existing = await db.users.find_one({
|
||||
"email": user_update.email,
|
||||
"_id": {"$ne": ObjectId(user_id)}
|
||||
})
|
||||
if existing:
|
||||
raise HTTPException(status_code=400, detail="Email already in use")
|
||||
update_data["email"] = user_update.email
|
||||
|
||||
if user_update.role is not None:
|
||||
update_data["role"] = user_update.role
|
||||
|
||||
if user_update.is_active is not None:
|
||||
update_data["is_active"] = user_update.is_active
|
||||
|
||||
if user_update.password is not None:
|
||||
from ...core.security import get_password_hash
|
||||
update_data["hashed_password"] = get_password_hash(user_update.password)
|
||||
|
||||
if update_data:
|
||||
update_data["updated_at"] = datetime.utcnow()
|
||||
await db.users.update_one(
|
||||
{"_id": ObjectId(user_id)},
|
||||
{"$set": update_data}
|
||||
)
|
||||
|
||||
return {"message": "User updated successfully"}
|
||||
|
||||
@router.delete("/users/{user_id}")
|
||||
async def delete_user(
|
||||
user_id: str,
|
||||
admin_user: UserInDB = Depends(get_current_admin_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Delete a user (admin only)"""
|
||||
# Don't allow admin to delete themselves
|
||||
if str(admin_user.id) == user_id:
|
||||
raise HTTPException(status_code=400, detail="Cannot delete your own account")
|
||||
|
||||
# Check if user exists
|
||||
user = await db.users.find_one({"_id": ObjectId(user_id)})
|
||||
if not user:
|
||||
raise HTTPException(status_code=404, detail="User not found")
|
||||
|
||||
# Delete user
|
||||
await db.users.delete_one({"_id": ObjectId(user_id)})
|
||||
|
||||
return {"message": "User deleted successfully"}
|
||||
|
||||
@router.post("/users/{user_id}/grant-index-access")
|
||||
async def grant_index_access(
|
||||
user_id: str,
|
||||
request_data: dict,
|
||||
admin_user: UserInDB = Depends(get_current_admin_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Grant user access to an index"""
|
||||
index_id = request_data.get("index_id")
|
||||
if not index_id:
|
||||
raise HTTPException(status_code=400, detail="index_id is required")
|
||||
|
||||
# Check if user exists
|
||||
user = await db.users.find_one({"_id": ObjectId(user_id)})
|
||||
if not user:
|
||||
raise HTTPException(status_code=404, detail="User not found")
|
||||
|
||||
# Check if index exists
|
||||
index = await db.indices.find_one({"index_id": index_id})
|
||||
if not index:
|
||||
raise HTTPException(status_code=404, detail="Index not found")
|
||||
|
||||
# Grant access
|
||||
await db.users.update_one(
|
||||
{"_id": ObjectId(user_id)},
|
||||
{"$addToSet": {"index_access": index_id}}
|
||||
)
|
||||
|
||||
return {"message": "Index access granted successfully"}
|
||||
|
||||
@router.post("/users/{user_id}/revoke-index-access")
|
||||
async def revoke_index_access(
|
||||
user_id: str,
|
||||
request_data: dict,
|
||||
admin_user: UserInDB = Depends(get_current_admin_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Revoke user access to an index"""
|
||||
index_id = request_data.get("index_id")
|
||||
if not index_id:
|
||||
raise HTTPException(status_code=400, detail="index_id is required")
|
||||
|
||||
# Check if user exists
|
||||
user = await db.users.find_one({"_id": ObjectId(user_id)})
|
||||
if not user:
|
||||
raise HTTPException(status_code=404, detail="User not found")
|
||||
|
||||
# Revoke access
|
||||
await db.users.update_one(
|
||||
{"_id": ObjectId(user_id)},
|
||||
{"$pull": {"index_access": index_id}}
|
||||
)
|
||||
|
||||
return {"message": "Index access revoked successfully"}
|
||||
|
||||
@router.post("/grant-all-indices/{user_id}")
|
||||
async def grant_all_indices_access(
|
||||
user_id: str,
|
||||
admin_user: UserInDB = Depends(get_current_admin_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Grant user access to all indices"""
|
||||
# Check if user exists
|
||||
user = await db.users.find_one({"_id": ObjectId(user_id)})
|
||||
if not user:
|
||||
raise HTTPException(status_code=404, detail="User not found")
|
||||
|
||||
# Get all active indices
|
||||
indices = []
|
||||
cursor = db.indices.find({"status": "active"})
|
||||
async for index in cursor:
|
||||
indices.append(index["index_id"])
|
||||
|
||||
# Grant access to all indices
|
||||
await db.users.update_one(
|
||||
{"_id": ObjectId(user_id)},
|
||||
{"$set": {"index_access": indices}}
|
||||
)
|
||||
|
||||
return {
|
||||
"message": "Access granted to all indices",
|
||||
"index_count": len(indices)
|
||||
}
|
||||
|
||||
@router.get("/stats")
|
||||
async def get_system_stats(
|
||||
admin_user: UserInDB = Depends(get_current_admin_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Get system statistics"""
|
||||
# Count users
|
||||
total_users = await db.users.count_documents({})
|
||||
active_users = await db.users.count_documents({"is_active": True})
|
||||
admin_users = await db.users.count_documents({"role": UserRole.ADMIN})
|
||||
|
||||
# Count indices
|
||||
total_indices = await db.indices.count_documents({"status": "active"})
|
||||
|
||||
# Count documents
|
||||
total_documents = await db.documents.count_documents({})
|
||||
pending_documents = await db.documents.count_documents({"processing_status": "pending"})
|
||||
processing_documents = await db.documents.count_documents({"processing_status": "processing"})
|
||||
completed_documents = await db.documents.count_documents({"processing_status": "completed"})
|
||||
failed_documents = await db.documents.count_documents({"processing_status": "failed"})
|
||||
|
||||
# Count chat messages
|
||||
total_messages = await db.chat_messages.count_documents({})
|
||||
|
||||
return {
|
||||
"users": {
|
||||
"total": total_users,
|
||||
"active": active_users,
|
||||
"admins": admin_users
|
||||
},
|
||||
"indices": {
|
||||
"total": total_indices
|
||||
},
|
||||
"documents": {
|
||||
"total": total_documents,
|
||||
"pending": pending_documents,
|
||||
"processing": processing_documents,
|
||||
"completed": completed_documents,
|
||||
"failed": failed_documents
|
||||
},
|
||||
"chat_messages": {
|
||||
"total": total_messages
|
||||
}
|
||||
}
|
||||
|
||||
@router.post("/documents/upload-single")
|
||||
async def upload_single_document(
|
||||
file: UploadFile = File(...),
|
||||
index_id: str = Form(...),
|
||||
custom_name: Optional[str] = Form(None),
|
||||
admin_user: UserInDB = Depends(get_current_admin_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Upload a single document for processing (admin only)"""
|
||||
# Verify index exists
|
||||
index = await db.indices.find_one({"index_id": index_id})
|
||||
if not index:
|
||||
raise HTTPException(status_code=404, detail="Index not found")
|
||||
|
||||
# Process document
|
||||
document = await llama_processor.process_single_file(
|
||||
file, index_id, admin_user, db, custom_name
|
||||
)
|
||||
|
||||
return {
|
||||
"message": "Document uploaded and processing started",
|
||||
"document_id": str(document.id),
|
||||
"filename": document.original_filename,
|
||||
"status": document.processing_status
|
||||
}
|
||||
|
||||
@router.post("/documents/upload-multiple")
|
||||
async def upload_multiple_documents(
|
||||
files: List[UploadFile] = File(...),
|
||||
index_id: str = Form(...),
|
||||
base_name: str = Form(...),
|
||||
admin_user: UserInDB = Depends(get_current_admin_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Upload multiple documents for processing (admin only)"""
|
||||
# Verify index exists
|
||||
index = await db.indices.find_one({"index_id": index_id})
|
||||
if not index:
|
||||
raise HTTPException(status_code=404, detail="Index not found")
|
||||
|
||||
# Process documents
|
||||
documents = await llama_processor.process_multiple_files(
|
||||
files, index_id, admin_user, db, base_name
|
||||
)
|
||||
|
||||
return {
|
||||
"message": f"Successfully uploaded {len(documents)} documents",
|
||||
"documents": [
|
||||
{
|
||||
"id": str(doc.id),
|
||||
"filename": doc.original_filename,
|
||||
"status": doc.processing_status
|
||||
} for doc in documents
|
||||
]
|
||||
}
|
||||
|
||||
@router.get("/documents/processing-status")
|
||||
async def get_processing_status(
|
||||
admin_user: UserInDB = Depends(get_current_admin_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Get overall processing status (admin only)"""
|
||||
|
||||
# Get counts for each status
|
||||
statuses = {
|
||||
"pending": 0,
|
||||
"processing": 0,
|
||||
"completed": 0,
|
||||
"failed": 0
|
||||
}
|
||||
|
||||
# Count processing status
|
||||
for status in statuses.keys():
|
||||
statuses[status] = await db.documents.count_documents({
|
||||
"processing_status": status
|
||||
})
|
||||
|
||||
# Count embedding status
|
||||
embedding_statuses = {}
|
||||
for status in statuses.keys():
|
||||
embedding_statuses[status] = await db.documents.count_documents({
|
||||
"embedding_status": status
|
||||
})
|
||||
|
||||
# Get documents with unclear status
|
||||
unclear_docs = await db.documents.count_documents({
|
||||
"$or": [
|
||||
{"processing_status": {"$exists": False}},
|
||||
{"embedding_status": {"$exists": False}}
|
||||
]
|
||||
})
|
||||
|
||||
return {
|
||||
"processing_status": statuses,
|
||||
"embedding_status": embedding_statuses,
|
||||
"unclear_status_count": unclear_docs,
|
||||
"total_documents": sum(statuses.values())
|
||||
}
|
||||
|
||||
@router.get("/documents/{index_id}")
|
||||
async def get_index_documents(
|
||||
index_id: str,
|
||||
admin_user: UserInDB = Depends(get_current_admin_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Get all documents for an index (admin only)"""
|
||||
# Decode URL-encoded index_id
|
||||
decoded_index_id = unquote(index_id)
|
||||
|
||||
# Verify index exists
|
||||
index = await db.indices.find_one({"index_id": decoded_index_id})
|
||||
if not index:
|
||||
raise HTTPException(status_code=404, detail="Index not found")
|
||||
|
||||
# Get documents
|
||||
documents = []
|
||||
cursor = db.documents.find({"index_id": decoded_index_id})
|
||||
|
||||
async for doc in cursor:
|
||||
documents.append({
|
||||
"id": str(doc["_id"]),
|
||||
"filename": doc["original_filename"],
|
||||
"file_size": doc["file_size"],
|
||||
"processing_status": doc["processing_status"],
|
||||
"embedding_status": doc["embedding_status"],
|
||||
"chunk_count": doc.get("chunk_count", 0),
|
||||
"created_at": doc["created_at"],
|
||||
"updated_at": doc["updated_at"]
|
||||
})
|
||||
|
||||
return {
|
||||
"index_id": decoded_index_id,
|
||||
"index_name": index["name"],
|
||||
"documents": documents,
|
||||
"total_documents": len(documents)
|
||||
}
|
||||
|
||||
@router.post("/documents/{document_id}/reprocess")
|
||||
async def reprocess_document(
|
||||
document_id: str,
|
||||
admin_user: UserInDB = Depends(get_current_admin_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Reprocess a document (admin only)"""
|
||||
# Get document
|
||||
document = await db.documents.find_one({"_id": ObjectId(document_id)})
|
||||
if not document:
|
||||
raise HTTPException(status_code=404, detail="Document not found")
|
||||
|
||||
# Reset processing status
|
||||
await db.documents.update_one(
|
||||
{"_id": ObjectId(document_id)},
|
||||
{"$set": {
|
||||
"processing_status": "pending",
|
||||
"embedding_status": "pending",
|
||||
"parsed_text": None,
|
||||
"text_chunks": None,
|
||||
"chunk_count": 0,
|
||||
"vector_ids": None,
|
||||
"updated_at": datetime.utcnow()
|
||||
}}
|
||||
)
|
||||
|
||||
# Create document object for reprocessing
|
||||
doc_obj = DocumentInDB(**document)
|
||||
|
||||
# Start reprocessing
|
||||
import asyncio
|
||||
asyncio.create_task(llama_processor._process_document_async(doc_obj, db))
|
||||
|
||||
return {
|
||||
"message": "Document reprocessing started",
|
||||
"document_id": document_id,
|
||||
"status": "pending"
|
||||
}
|
||||
|
||||
@router.delete("/documents/{document_id}")
|
||||
async def delete_document(
|
||||
document_id: str,
|
||||
admin_user: UserInDB = Depends(get_current_admin_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Delete a document and its embeddings (admin only)"""
|
||||
# Get document
|
||||
document = await db.documents.find_one({"_id": ObjectId(document_id)})
|
||||
if not document:
|
||||
raise HTTPException(status_code=404, detail="Document not found")
|
||||
|
||||
# Use the enhanced document processor for complete cleanup
|
||||
from ...services.document_processor import document_processor
|
||||
success = await document_processor.delete_document(document_id, db)
|
||||
|
||||
if not success:
|
||||
raise HTTPException(status_code=500, detail="Failed to delete document")
|
||||
|
||||
return {
|
||||
"message": "Document deleted successfully",
|
||||
"document_id": document_id
|
||||
}
|
||||
|
||||
@router.post("/chat/query")
|
||||
async def admin_chat_query(
|
||||
query: str = Form(...),
|
||||
index_id: str = Form(...),
|
||||
top_k: int = Form(5),
|
||||
admin_user: UserInDB = Depends(get_current_admin_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Query documents using RAG (admin only)"""
|
||||
# Verify index exists
|
||||
index = await db.indices.find_one({"index_id": index_id})
|
||||
if not index:
|
||||
raise HTTPException(status_code=404, detail="Index not found")
|
||||
|
||||
try:
|
||||
# Query vector store
|
||||
results = await llama_processor.query_documents(
|
||||
query, index_id, top_k
|
||||
)
|
||||
|
||||
# Extract context chunks
|
||||
context_chunks = [result["content"] for result in results]
|
||||
|
||||
# Generate response
|
||||
response = await llama_processor.generate_response(
|
||||
query, context_chunks, index_id
|
||||
)
|
||||
|
||||
return {
|
||||
"query": query,
|
||||
"response": response,
|
||||
"sources": results,
|
||||
"index_id": index_id
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail=f"Error processing query: {str(e)}"
|
||||
)
|
||||
|
||||
@router.get("/indices")
|
||||
async def get_all_indices(
|
||||
admin_user: UserInDB = Depends(get_current_admin_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Get all indices (admin only)"""
|
||||
indices = []
|
||||
cursor = db.indices.find({"status": "active"})
|
||||
|
||||
async for index in cursor:
|
||||
# Count documents for this index
|
||||
doc_count = await db.documents.count_documents({"index_id": index["index_id"]})
|
||||
|
||||
indices.append({
|
||||
"id": str(index["_id"]),
|
||||
"index_id": index["index_id"],
|
||||
"name": index["name"],
|
||||
"description": index.get("description", ""),
|
||||
"document_count": doc_count,
|
||||
"created_by": str(index["created_by"]),
|
||||
"created_at": index["created_at"],
|
||||
"chunk_size": index.get("chunk_size", 1000),
|
||||
"chunk_overlap": index.get("chunk_overlap", 200)
|
||||
})
|
||||
|
||||
return {
|
||||
"indices": indices,
|
||||
"total": len(indices)
|
||||
}
|
||||
|
||||
@router.post("/indices/create")
|
||||
async def create_index(
|
||||
name: str = Form(...),
|
||||
description: Optional[str] = Form(None),
|
||||
chunk_size: int = Form(1000),
|
||||
chunk_overlap: int = Form(200),
|
||||
admin_user: UserInDB = Depends(get_current_admin_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Create a new index (admin only)"""
|
||||
import uuid
|
||||
|
||||
# Generate unique index ID
|
||||
index_id = str(uuid.uuid4())
|
||||
|
||||
# Create index record
|
||||
index_data = {
|
||||
"index_id": index_id,
|
||||
"name": name,
|
||||
"description": description,
|
||||
"created_by": admin_user.id,
|
||||
"created_at": datetime.utcnow(),
|
||||
"updated_at": datetime.utcnow(),
|
||||
"status": "active",
|
||||
"document_count": 0,
|
||||
"chunk_size": chunk_size,
|
||||
"chunk_overlap": chunk_overlap,
|
||||
"embedding_model": "text-embedding-ada-002",
|
||||
"settings": {}
|
||||
}
|
||||
|
||||
# Save to database
|
||||
result = await db.indices.insert_one(index_data)
|
||||
|
||||
return {
|
||||
"message": "Index created successfully",
|
||||
"index_id": index_id,
|
||||
"name": name,
|
||||
"id": str(result.inserted_id)
|
||||
}
|
||||
|
||||
@router.delete("/indices/{index_id}")
|
||||
async def delete_index(
|
||||
index_id: str,
|
||||
admin_user: UserInDB = Depends(get_current_admin_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Delete an index and all its documents (admin only)"""
|
||||
# Decode URL-encoded index_id
|
||||
decoded_index_id = unquote(index_id)
|
||||
|
||||
# Check if index exists
|
||||
index = await db.indices.find_one({"index_id": decoded_index_id})
|
||||
if not index:
|
||||
raise HTTPException(status_code=404, detail="Index not found")
|
||||
|
||||
try:
|
||||
# Get all documents for this index
|
||||
documents_cursor = db.documents.find({"index_id": decoded_index_id})
|
||||
document_count = 0
|
||||
|
||||
async for doc in documents_cursor:
|
||||
document_count += 1
|
||||
# Delete embeddings from vector store
|
||||
await llama_processor.delete_document_embeddings(
|
||||
str(doc["_id"]), decoded_index_id
|
||||
)
|
||||
|
||||
# Delete file from filesystem
|
||||
from pathlib import Path
|
||||
file_path = Path(doc["file_path"])
|
||||
if file_path.exists():
|
||||
try:
|
||||
file_path.unlink()
|
||||
except Exception as e:
|
||||
print(f"Error deleting file {file_path}: {e}")
|
||||
|
||||
# Delete all documents for this index
|
||||
await db.documents.delete_many({"index_id": decoded_index_id})
|
||||
|
||||
# Delete all chat messages for this index
|
||||
chat_result = await db.chat_messages.delete_many({"index_id": decoded_index_id})
|
||||
|
||||
# Delete the index record
|
||||
await db.indices.delete_one({"index_id": decoded_index_id})
|
||||
|
||||
# Use complete ChromaDB cleanup instead of manual deletion
|
||||
from ...services.rag_service import rag_service
|
||||
cleanup_result = await rag_service.delete_index_complete(decoded_index_id)
|
||||
if not cleanup_result["success"]:
|
||||
print(f"Warning during complete index cleanup: {cleanup_result['message']}")
|
||||
|
||||
# Note: Cache clearing removed - caching is disabled for data freshness
|
||||
|
||||
# Delete index upload directory
|
||||
from pathlib import Path
|
||||
index_dir = Path(settings.upload_dir) / decoded_index_id
|
||||
if index_dir.exists():
|
||||
try:
|
||||
import shutil
|
||||
shutil.rmtree(index_dir)
|
||||
except Exception as e:
|
||||
print(f"Error deleting index directory {index_dir}: {e}")
|
||||
|
||||
return {
|
||||
"message": "Index deleted successfully",
|
||||
"index_id": decoded_index_id,
|
||||
"documents_deleted": document_count,
|
||||
"chat_messages_deleted": chat_result.deleted_count
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail=f"Error deleting index: {str(e)}"
|
||||
)
|
||||
|
||||
@router.post("/documents/process-pending")
|
||||
async def process_pending_documents(
|
||||
admin_user: UserInDB = Depends(get_current_admin_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Process all pending documents (admin only)"""
|
||||
import asyncio
|
||||
|
||||
# Find all documents that are pending or failed
|
||||
cursor = db.documents.find({
|
||||
"$or": [
|
||||
{"processing_status": "pending"},
|
||||
{"processing_status": "failed"},
|
||||
{"embedding_status": "pending"},
|
||||
{"embedding_status": "failed"}
|
||||
]
|
||||
})
|
||||
|
||||
processed_count = 0
|
||||
async for doc in cursor:
|
||||
try:
|
||||
document = DocumentInDB(**doc)
|
||||
|
||||
# Start processing
|
||||
asyncio.create_task(llama_processor._process_document_async(document, db))
|
||||
processed_count += 1
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error queueing document {doc['_id']} for processing: {e}")
|
||||
|
||||
return {
|
||||
"message": f"Queued {processed_count} documents for processing",
|
||||
"count": processed_count
|
||||
}
|
||||
303
backend/app/api/v1/auth.py
Normal file
303
backend/app/api/v1/auth.py
Normal file
|
|
@ -0,0 +1,303 @@
|
|||
from fastapi import APIRouter, Depends, HTTPException, status # type: ignore
|
||||
from fastapi.security import HTTPBearer # type: ignore
|
||||
from motor.motor_asyncio import AsyncIOMotorDatabase # type: ignore
|
||||
from pydantic import BaseModel # type: ignore
|
||||
from datetime import timedelta
|
||||
from bson import ObjectId # type: ignore
|
||||
import logging
|
||||
|
||||
from ...config.database import get_database
|
||||
from ...config.settings import settings
|
||||
from ...core.security import verify_password, get_password_hash, create_access_token
|
||||
from ...models.user import UserInDB, UserCreate, UserRole, UserResponse, AuthMethod
|
||||
from ...core.auth import get_current_active_user
|
||||
from ...services.sso_service import sso_service
|
||||
|
||||
router = APIRouter()
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
class LoginRequest(BaseModel):
|
||||
email: str
|
||||
password: str
|
||||
|
||||
class LoginResponse(BaseModel):
|
||||
access_token: str
|
||||
token_type: str
|
||||
user: dict
|
||||
|
||||
class RegisterRequest(BaseModel):
|
||||
email: str
|
||||
password: str
|
||||
role: UserRole = UserRole.USER
|
||||
|
||||
class SSOLoginRequest(BaseModel):
|
||||
access_token: str
|
||||
|
||||
class SSOConfigResponse(BaseModel):
|
||||
client_id: str
|
||||
authority: str
|
||||
redirect_uri: str
|
||||
enabled: bool
|
||||
|
||||
@router.post("/login", response_model=LoginResponse)
|
||||
async def login(
|
||||
login_data: LoginRequest,
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Authenticate user with local credentials and return access token"""
|
||||
# Find user by email
|
||||
user = await db.users.find_one({"email": login_data.email})
|
||||
if not user:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="Incorrect email or password",
|
||||
headers={"WWW-Authenticate": "Bearer"},
|
||||
)
|
||||
|
||||
user_obj = UserInDB(**user)
|
||||
|
||||
# Check if user has a local password (for local auth)
|
||||
if not user_obj.hashed_password:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="User account requires SSO authentication",
|
||||
headers={"WWW-Authenticate": "Bearer"},
|
||||
)
|
||||
|
||||
# Verify password
|
||||
if not verify_password(login_data.password, user_obj.hashed_password):
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="Incorrect email or password",
|
||||
headers={"WWW-Authenticate": "Bearer"},
|
||||
)
|
||||
|
||||
# Check if user is active
|
||||
if not user_obj.is_active:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
detail="Inactive user"
|
||||
)
|
||||
|
||||
# Create access token
|
||||
access_token = create_access_token(data={"sub": str(user_obj.id)})
|
||||
|
||||
return LoginResponse(
|
||||
access_token=access_token,
|
||||
token_type="bearer",
|
||||
user={
|
||||
"id": str(user_obj.id),
|
||||
"email": user_obj.email,
|
||||
"role": user_obj.role,
|
||||
"is_active": user_obj.is_active,
|
||||
"index_access": user_obj.index_access
|
||||
}
|
||||
)
|
||||
|
||||
@router.post("/register", response_model=dict)
|
||||
async def register(
|
||||
register_data: RegisterRequest,
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Register a new user"""
|
||||
# Check if user already exists
|
||||
existing_user = await db.users.find_one({"email": register_data.email})
|
||||
if existing_user:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
detail="User with this email already exists"
|
||||
)
|
||||
|
||||
# Create new user
|
||||
hashed_password = get_password_hash(register_data.password)
|
||||
user_data = UserCreate(
|
||||
email=register_data.email,
|
||||
password=register_data.password,
|
||||
role=register_data.role
|
||||
)
|
||||
|
||||
user_dict = user_data.dict()
|
||||
user_dict["hashed_password"] = hashed_password
|
||||
del user_dict["password"]
|
||||
user_dict["index_access"] = []
|
||||
|
||||
# Insert user into database
|
||||
result = await db.users.insert_one(user_dict)
|
||||
|
||||
return {"message": "User registered successfully", "user_id": str(result.inserted_id)}
|
||||
|
||||
@router.get("/me", response_model=UserResponse)
|
||||
async def get_current_user_info(
|
||||
current_user: UserInDB = Depends(get_current_active_user)
|
||||
):
|
||||
"""Get current user information"""
|
||||
return UserResponse(
|
||||
_id=current_user.id,
|
||||
email=current_user.email,
|
||||
role=current_user.role,
|
||||
is_active=current_user.is_active,
|
||||
index_access=current_user.index_access,
|
||||
auth_method=current_user.auth_method,
|
||||
sso_provider=current_user.sso_provider,
|
||||
sso_name=current_user.sso_name,
|
||||
last_sso_login=current_user.last_sso_login,
|
||||
created_at=current_user.created_at,
|
||||
updated_at=current_user.updated_at
|
||||
)
|
||||
|
||||
@router.post("/refresh", response_model=LoginResponse)
|
||||
async def refresh_token(
|
||||
current_user: UserInDB = Depends(get_current_active_user)
|
||||
):
|
||||
"""Refresh access token for active user"""
|
||||
# Create new access token
|
||||
access_token = create_access_token(data={"sub": str(current_user.id)})
|
||||
|
||||
return LoginResponse(
|
||||
access_token=access_token,
|
||||
token_type="bearer",
|
||||
user={
|
||||
"id": str(current_user.id),
|
||||
"email": current_user.email,
|
||||
"role": current_user.role,
|
||||
"is_active": current_user.is_active,
|
||||
"auth_method": current_user.auth_method,
|
||||
"sso_provider": current_user.sso_provider,
|
||||
"sso_name": current_user.sso_name,
|
||||
"index_access": current_user.index_access
|
||||
}
|
||||
)
|
||||
|
||||
@router.post("/logout")
|
||||
async def logout():
|
||||
"""Logout user (client should discard token)"""
|
||||
return {"message": "Logged out successfully"}
|
||||
|
||||
@router.get("/sso/config", response_model=SSOConfigResponse)
|
||||
async def get_sso_config():
|
||||
"""Get SSO configuration for frontend"""
|
||||
if not settings.sso_enabled:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_404_NOT_FOUND,
|
||||
detail="SSO is not enabled"
|
||||
)
|
||||
|
||||
if not all([settings.azure_client_id, settings.azure_authority, settings.azure_redirect_uri]):
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
|
||||
detail="SSO is not properly configured"
|
||||
)
|
||||
|
||||
return SSOConfigResponse(
|
||||
client_id=settings.azure_client_id,
|
||||
authority=settings.azure_authority,
|
||||
redirect_uri=settings.azure_redirect_uri,
|
||||
enabled=settings.sso_enabled
|
||||
)
|
||||
|
||||
@router.post("/sso/validate", response_model=LoginResponse)
|
||||
async def sso_login(sso_data: SSOLoginRequest):
|
||||
"""Validate SSO token and authenticate user"""
|
||||
logger.info("=== SSO Login Request ===")
|
||||
logger.info(f"SSO enabled: {settings.sso_enabled}")
|
||||
logger.info(f"Token length: {len(sso_data.access_token) if sso_data.access_token else 'None'}")
|
||||
|
||||
if not settings.sso_enabled:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_404_NOT_FOUND,
|
||||
detail="SSO is not enabled"
|
||||
)
|
||||
|
||||
try:
|
||||
logger.info("Starting SSO token processing...")
|
||||
# Process SSO login using the service
|
||||
user = await sso_service.process_sso_login(sso_data.access_token)
|
||||
logger.info(f"SSO processing successful, user: {user.email}")
|
||||
|
||||
# Create our internal JWT token
|
||||
access_token = create_access_token(data={"sub": str(user.id)})
|
||||
logger.info("Internal JWT token created successfully")
|
||||
|
||||
return LoginResponse(
|
||||
access_token=access_token,
|
||||
token_type="bearer",
|
||||
user={
|
||||
"id": str(user.id),
|
||||
"email": user.email,
|
||||
"role": user.role,
|
||||
"is_active": user.is_active,
|
||||
"auth_method": user.auth_method,
|
||||
"sso_provider": user.sso_provider,
|
||||
"sso_name": user.sso_name,
|
||||
"index_access": user.index_access
|
||||
}
|
||||
)
|
||||
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"SSO authentication failed: {str(e)}", exc_info=True)
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
|
||||
detail=f"SSO authentication failed: {str(e)}"
|
||||
)
|
||||
|
||||
@router.post("/login/local", response_model=LoginResponse)
|
||||
async def local_login(
|
||||
login_data: LoginRequest,
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Explicit local authentication (backup admin login)"""
|
||||
if not settings.allow_local_admin:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_404_NOT_FOUND,
|
||||
detail="Local authentication is disabled"
|
||||
)
|
||||
|
||||
# Only allow admin@oliver.agency for local backup
|
||||
if login_data.email != "admin@oliver.agency":
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="Local authentication only available for admin account"
|
||||
)
|
||||
|
||||
# Use the same logic as regular login
|
||||
return await login(login_data, db)
|
||||
|
||||
# Initialize default users
|
||||
@router.post("/init-users")
|
||||
async def init_default_users(
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Initialize default users (admin and user)"""
|
||||
# Check if admin user exists
|
||||
admin_exists = await db.users.find_one({"email": "admin@oliver.agency"})
|
||||
if not admin_exists:
|
||||
admin_user = {
|
||||
"email": "admin@oliver.agency",
|
||||
"hashed_password": get_password_hash("admin123"),
|
||||
"role": UserRole.ADMIN,
|
||||
"is_active": True,
|
||||
"auth_method": AuthMethod.LOCAL,
|
||||
"index_access": [],
|
||||
"created_at": None,
|
||||
"updated_at": None
|
||||
}
|
||||
await db.users.insert_one(admin_user)
|
||||
|
||||
# Check if regular user exists
|
||||
user_exists = await db.users.find_one({"email": "user@oliver.agency"})
|
||||
if not user_exists:
|
||||
regular_user = {
|
||||
"email": "user@oliver.agency",
|
||||
"hashed_password": get_password_hash("user123"),
|
||||
"role": UserRole.USER,
|
||||
"is_active": True,
|
||||
"auth_method": AuthMethod.LOCAL,
|
||||
"index_access": [],
|
||||
"created_at": None,
|
||||
"updated_at": None
|
||||
}
|
||||
await db.users.insert_one(regular_user)
|
||||
|
||||
return {"message": "Default users initialized"}
|
||||
406
backend/app/api/v1/chat.py
Normal file
406
backend/app/api/v1/chat.py
Normal file
|
|
@ -0,0 +1,406 @@
|
|||
from fastapi import APIRouter, Depends, HTTPException
|
||||
from motor.motor_asyncio import AsyncIOMotorDatabase
|
||||
from typing import Dict, Any, List
|
||||
import time
|
||||
from datetime import datetime
|
||||
from bson import ObjectId
|
||||
from urllib.parse import unquote
|
||||
|
||||
from ...config.database import get_database
|
||||
from ...core.auth import get_current_active_user
|
||||
# Cache import removed - caching disabled for data freshness
|
||||
from ...models.user import UserInDB
|
||||
from ...models.chat import ChatQuery, ChatResponse, ChatMessageCreate, ChatMessageInDB
|
||||
from ...services.rag_service import rag_service
|
||||
from ...services.llama_processor import llama_processor
|
||||
from ...services.chat_context_service import chat_context_service
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
@router.post("/query", response_model=ChatResponse)
|
||||
async def chat_query(
|
||||
query_data: ChatQuery,
|
||||
current_user: UserInDB = Depends(get_current_active_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Process a chat query against a document index"""
|
||||
start_time = time.time()
|
||||
|
||||
# Check if user has access to this index
|
||||
if current_user.role.value != "admin" and query_data.index_id not in current_user.index_access:
|
||||
raise HTTPException(status_code=403, detail="Access denied to this index")
|
||||
|
||||
# Check if index exists in database
|
||||
index = await db.indices.find_one({"index_id": query_data.index_id})
|
||||
if not index:
|
||||
raise HTTPException(
|
||||
status_code=404,
|
||||
detail=f"Index '{query_data.index_id}' not found"
|
||||
)
|
||||
|
||||
# Check if index has processed documents
|
||||
processed_docs = await db.documents.count_documents({
|
||||
"index_id": query_data.index_id,
|
||||
"processing_status": "completed",
|
||||
"embedding_status": "completed"
|
||||
})
|
||||
|
||||
# If no completed documents, check for any documents at all
|
||||
if processed_docs == 0:
|
||||
total_docs = await db.documents.count_documents({"index_id": query_data.index_id})
|
||||
if total_docs == 0:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=f"Index '{query_data.index_id}' has no documents. Please upload documents first."
|
||||
)
|
||||
else:
|
||||
# Check processing status
|
||||
processing_docs = await db.documents.count_documents({
|
||||
"index_id": query_data.index_id,
|
||||
"$or": [
|
||||
{"processing_status": "processing"},
|
||||
{"embedding_status": "processing"},
|
||||
{"processing_status": "pending"},
|
||||
{"embedding_status": "pending"}
|
||||
]
|
||||
})
|
||||
|
||||
failed_docs = await db.documents.count_documents({
|
||||
"index_id": query_data.index_id,
|
||||
"$or": [
|
||||
{"processing_status": "failed"},
|
||||
{"embedding_status": "failed"}
|
||||
]
|
||||
})
|
||||
|
||||
if processing_docs > 0:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=f"Index '{query_data.index_id}' has {processing_docs} documents still processing. Please wait for processing to complete."
|
||||
)
|
||||
elif failed_docs > 0:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=f"Index '{query_data.index_id}' has {failed_docs} documents that failed to process. Please check the admin panel and reprocess the documents."
|
||||
)
|
||||
else:
|
||||
# Documents exist but status is unclear, check if any have parsed text
|
||||
docs_with_text = await db.documents.count_documents({
|
||||
"index_id": query_data.index_id,
|
||||
"parsed_text": {"$exists": True, "$ne": None, "$ne": ""}
|
||||
})
|
||||
|
||||
if docs_with_text > 0:
|
||||
print(f"Warning: Index {query_data.index_id} has documents with unclear processing status but {docs_with_text} have parsed text")
|
||||
# Continue with the query attempt
|
||||
else:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=f"Index '{query_data.index_id}' has documents but none have been processed successfully. Please check the admin panel."
|
||||
)
|
||||
|
||||
# Query vector store for relevant chunks
|
||||
try:
|
||||
vector_results = await llama_processor.query_documents(
|
||||
query_data.query, query_data.index_id, top_k=10
|
||||
)
|
||||
|
||||
# Extract context chunks
|
||||
context_chunks = [result["content"] for result in vector_results]
|
||||
|
||||
# Generate contextual response with conversation history
|
||||
ai_result = await chat_context_service.generate_contextual_response(
|
||||
query_data.query,
|
||||
query_data.index_id,
|
||||
str(current_user.id),
|
||||
db,
|
||||
context_chunks
|
||||
)
|
||||
|
||||
result = {
|
||||
"success": True,
|
||||
"response": ai_result["response"],
|
||||
"sources": vector_results,
|
||||
"context_used": ai_result.get("context_used"),
|
||||
"context_messages_count": ai_result.get("context_messages_count", 0)
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error in chat query: {e}")
|
||||
# Handle specific ChromaDB errors more gracefully
|
||||
if "does not exist" in str(e) or "Collection" in str(e):
|
||||
# Check document count again
|
||||
total_docs = await db.documents.count_documents({"index_id": query_data.index_id})
|
||||
if total_docs == 0:
|
||||
raise HTTPException(
|
||||
status_code=404,
|
||||
detail=f"No documents found in index '{query_data.index_id}'. Please upload documents first."
|
||||
)
|
||||
else:
|
||||
raise HTTPException(
|
||||
status_code=404,
|
||||
detail=f"Vector database not ready for index '{query_data.index_id}'. The documents may still be processing. Please wait and try again."
|
||||
)
|
||||
else:
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail=f"Error processing query: {str(e)}"
|
||||
)
|
||||
|
||||
# Result is always successful at this point since we handle errors above
|
||||
|
||||
# Prepare response
|
||||
response_time = time.time() - start_time
|
||||
debug_info = {
|
||||
"sources": result.get("sources", []),
|
||||
"context_used": result.get("context_used"),
|
||||
"context_messages_count": result.get("context_messages_count", 0),
|
||||
"cached": False,
|
||||
"response_time": response_time
|
||||
}
|
||||
|
||||
response = ChatResponse(
|
||||
response=result["response"],
|
||||
debug_info=debug_info,
|
||||
cached=False,
|
||||
response_time=response_time
|
||||
)
|
||||
|
||||
# Save chat message to database
|
||||
await _save_chat_message(
|
||||
current_user.id, query_data, response, db
|
||||
)
|
||||
|
||||
return response
|
||||
|
||||
@router.get("/history/{index_id}")
|
||||
async def get_chat_history(
|
||||
index_id: str,
|
||||
limit: int = 50,
|
||||
current_user: UserInDB = Depends(get_current_active_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Get chat history for a specific index"""
|
||||
# Decode URL-encoded index_id
|
||||
decoded_index_id = unquote(index_id)
|
||||
|
||||
# Check if user has access to this index
|
||||
if current_user.role.value != "admin" and decoded_index_id not in current_user.index_access:
|
||||
raise HTTPException(status_code=403, detail="Access denied to this index")
|
||||
|
||||
# Get chat messages in chronological order (oldest first), excluding soft-deleted
|
||||
cursor = db.chat_messages.find({
|
||||
"user_id": current_user.id,
|
||||
"index_id": decoded_index_id,
|
||||
"deleted_by_user": {"$ne": True}
|
||||
}).sort("created_at", 1).limit(limit)
|
||||
|
||||
messages = []
|
||||
async for msg in cursor:
|
||||
message = ChatMessageInDB(**msg)
|
||||
# Use separate timestamps if available, otherwise use created_at
|
||||
user_time = msg.get("user_timestamp", message.created_at)
|
||||
assistant_time = msg.get("assistant_timestamp", message.created_at)
|
||||
|
||||
messages.append({
|
||||
"id": str(message.id),
|
||||
"query": message.query,
|
||||
"response": message.response,
|
||||
"created_at": message.created_at,
|
||||
"user_timestamp": user_time,
|
||||
"assistant_timestamp": assistant_time,
|
||||
"response_time": message.response_time,
|
||||
"cached": message.cached,
|
||||
"sources": message.sources,
|
||||
"context_used": message.context_used
|
||||
})
|
||||
|
||||
return {"messages": messages}
|
||||
|
||||
@router.delete("/history/{index_id}")
|
||||
async def clear_chat_history(
|
||||
index_id: str,
|
||||
current_user: UserInDB = Depends(get_current_active_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Clear chat history for a specific index (soft delete)"""
|
||||
# Decode URL-encoded index_id
|
||||
decoded_index_id = unquote(index_id)
|
||||
|
||||
# Check if user has access to this index
|
||||
if current_user.role.value != "admin" and decoded_index_id not in current_user.index_access:
|
||||
raise HTTPException(status_code=403, detail="Access denied to this index")
|
||||
|
||||
# Soft delete chat messages by marking them as deleted
|
||||
result = await db.chat_messages.update_many(
|
||||
{
|
||||
"user_id": current_user.id,
|
||||
"index_id": decoded_index_id,
|
||||
"deleted_by_user": {"$ne": True}
|
||||
},
|
||||
{
|
||||
"$set": {
|
||||
"deleted_by_user": True,
|
||||
"updated_at": datetime.utcnow()
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
# Note: Cache clearing removed - caching is disabled for data freshness
|
||||
|
||||
return {"message": f"Cleared {result.modified_count} chat messages"}
|
||||
|
||||
@router.get("/context/{index_id}")
|
||||
async def get_conversation_context(
|
||||
index_id: str,
|
||||
limit: int = 5,
|
||||
current_user: UserInDB = Depends(get_current_active_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Get conversation context for debugging/display"""
|
||||
# Decode URL-encoded index_id
|
||||
decoded_index_id = unquote(index_id)
|
||||
|
||||
# Check if user has access to this index
|
||||
if current_user.role.value != "admin" and decoded_index_id not in current_user.index_access:
|
||||
raise HTTPException(status_code=403, detail="Access denied to this index")
|
||||
|
||||
# Get conversation context
|
||||
context_messages = await chat_context_service.get_conversation_context(
|
||||
str(current_user.id), decoded_index_id, db, limit
|
||||
)
|
||||
|
||||
return {
|
||||
"context_messages": context_messages,
|
||||
"count": len(context_messages),
|
||||
"formatted_context": chat_context_service.format_context_for_ai(context_messages)
|
||||
}
|
||||
|
||||
@router.get("/index-status/{index_id}")
|
||||
async def get_index_chat_status(
|
||||
index_id: str,
|
||||
current_user: UserInDB = Depends(get_current_active_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Check if an index is ready for chat queries"""
|
||||
# Decode URL-encoded index_id
|
||||
decoded_index_id = unquote(index_id)
|
||||
|
||||
# Check if user has access to this index
|
||||
if current_user.role.value != "admin" and decoded_index_id not in current_user.index_access:
|
||||
raise HTTPException(status_code=403, detail="Access denied to this index")
|
||||
|
||||
# Check if index exists in database
|
||||
index = await db.indices.find_one({"index_id": decoded_index_id})
|
||||
if not index:
|
||||
return {
|
||||
"ready": False,
|
||||
"reason": "Index not found",
|
||||
"details": {
|
||||
"index_exists": False,
|
||||
"total_documents": 0,
|
||||
"processed_documents": 0,
|
||||
"failed_documents": 0,
|
||||
"processing_documents": 0
|
||||
}
|
||||
}
|
||||
|
||||
# Get document statistics
|
||||
total_docs = await db.documents.count_documents({"index_id": decoded_index_id})
|
||||
processed_docs = await db.documents.count_documents({
|
||||
"index_id": decoded_index_id,
|
||||
"processing_status": "completed",
|
||||
"embedding_status": "completed"
|
||||
})
|
||||
failed_docs = await db.documents.count_documents({
|
||||
"index_id": decoded_index_id,
|
||||
"$or": [
|
||||
{"processing_status": "failed"},
|
||||
{"embedding_status": "failed"}
|
||||
]
|
||||
})
|
||||
processing_docs = await db.documents.count_documents({
|
||||
"index_id": decoded_index_id,
|
||||
"$or": [
|
||||
{"processing_status": "processing"},
|
||||
{"embedding_status": "processing"},
|
||||
{"processing_status": "pending"},
|
||||
{"embedding_status": "pending"}
|
||||
]
|
||||
})
|
||||
|
||||
# Check ChromaDB collection
|
||||
collection_info = llama_processor.get_collection_info(decoded_index_id)
|
||||
|
||||
# Determine if ready
|
||||
ready = processed_docs > 0 and collection_info["exists"] and collection_info["count"] > 0
|
||||
|
||||
reason = ""
|
||||
if total_docs == 0:
|
||||
reason = "No documents uploaded"
|
||||
elif processed_docs == 0:
|
||||
if processing_docs > 0:
|
||||
reason = f"{processing_docs} documents still processing"
|
||||
elif failed_docs > 0:
|
||||
reason = f"All {failed_docs} documents failed to process"
|
||||
else:
|
||||
reason = "No documents have been processed successfully"
|
||||
elif not collection_info["exists"]:
|
||||
reason = "Vector database collection not found"
|
||||
elif collection_info["count"] == 0:
|
||||
reason = "Vector database collection is empty"
|
||||
|
||||
return {
|
||||
"ready": ready,
|
||||
"reason": reason if not ready else "Index ready for queries",
|
||||
"details": {
|
||||
"index_exists": True,
|
||||
"index_name": index["name"],
|
||||
"total_documents": total_docs,
|
||||
"processed_documents": processed_docs,
|
||||
"failed_documents": failed_docs,
|
||||
"processing_documents": processing_docs,
|
||||
"collection_exists": collection_info["exists"],
|
||||
"collection_count": collection_info.get("count", 0),
|
||||
"collection_error": collection_info.get("error")
|
||||
}
|
||||
}
|
||||
|
||||
async def _save_chat_message(
|
||||
user_id,
|
||||
query_data: ChatQuery,
|
||||
response: ChatResponse,
|
||||
db: AsyncIOMotorDatabase
|
||||
):
|
||||
"""Save chat message to database with proper timestamp"""
|
||||
try:
|
||||
current_time = datetime.utcnow()
|
||||
|
||||
message_data = ChatMessageCreate(
|
||||
user_id=user_id,
|
||||
index_id=query_data.index_id,
|
||||
query=query_data.query,
|
||||
response=response.response,
|
||||
created_at=current_time,
|
||||
updated_at=current_time
|
||||
)
|
||||
|
||||
message_dict = message_data.dict()
|
||||
message_dict["debug_info"] = response.debug_info
|
||||
message_dict["response_time"] = response.response_time
|
||||
message_dict["cached"] = response.cached
|
||||
message_dict["sources"] = response.debug_info.get("sources", [])
|
||||
message_dict["context_used"] = response.debug_info.get("context_used")
|
||||
message_dict["created_at"] = current_time
|
||||
message_dict["updated_at"] = current_time
|
||||
|
||||
# Add separate timestamps for user message and assistant response
|
||||
message_dict["user_timestamp"] = current_time
|
||||
message_dict["assistant_timestamp"] = current_time
|
||||
message_dict["deleted_by_user"] = False
|
||||
|
||||
await db.chat_messages.insert_one(message_dict)
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error saving chat message: {e}")
|
||||
# Don't fail the request if we can't save the message
|
||||
323
backend/app/api/v1/documents.py
Normal file
323
backend/app/api/v1/documents.py
Normal file
|
|
@ -0,0 +1,323 @@
|
|||
from fastapi import APIRouter, Depends, HTTPException, UploadFile, File, Form
|
||||
from fastapi.responses import FileResponse
|
||||
from motor.motor_asyncio import AsyncIOMotorDatabase
|
||||
from typing import List
|
||||
from bson import ObjectId
|
||||
from datetime import datetime
|
||||
import os
|
||||
import asyncio
|
||||
from urllib.parse import unquote
|
||||
|
||||
from ...config.database import get_database
|
||||
from ...core.auth import get_current_active_user, require_index_access
|
||||
from ...models.user import UserInDB
|
||||
from ...models.document import Document, DocumentInDB
|
||||
from ...models.contract_summary import ContractSummaryResponse
|
||||
from ...services.document_processor import document_processor
|
||||
from ...services.rag_service import rag_service
|
||||
from ...services.llama_processor import llama_processor
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
@router.post("/upload", response_model=dict)
|
||||
async def upload_document(
|
||||
file: UploadFile = File(...),
|
||||
index_id: str = Form(...),
|
||||
current_user: UserInDB = Depends(get_current_active_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Upload a document to an index"""
|
||||
# Check if user has access to this index
|
||||
if current_user.role.value != "admin" and index_id not in current_user.index_access:
|
||||
raise HTTPException(status_code=403, detail="Access denied to this index")
|
||||
|
||||
# Process the upload using LlamaProcessor (includes contract summary processing)
|
||||
document = await llama_processor.process_single_file(file, index_id, current_user, db)
|
||||
|
||||
return {
|
||||
"message": "Document uploaded successfully",
|
||||
"document_id": str(document.id),
|
||||
"filename": document.filename,
|
||||
"processing_status": "pending"
|
||||
}
|
||||
|
||||
# Background processing is now handled by LlamaProcessor automatically
|
||||
|
||||
@router.get("/index/{index_id}", response_model=List[dict])
|
||||
async def get_documents_by_index(
|
||||
index_id: str,
|
||||
current_user: UserInDB = Depends(get_current_active_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Get all documents for a specific index"""
|
||||
# Decode URL-encoded index_id
|
||||
decoded_index_id = unquote(index_id)
|
||||
|
||||
# Check if user has access to this index
|
||||
if current_user.role.value != "admin" and decoded_index_id not in current_user.index_access:
|
||||
raise HTTPException(status_code=403, detail="Access denied to this index")
|
||||
|
||||
documents = await document_processor.get_documents_by_index(decoded_index_id, db)
|
||||
|
||||
return [
|
||||
{
|
||||
"id": str(doc.id),
|
||||
"filename": doc.filename,
|
||||
"original_filename": doc.original_filename,
|
||||
"file_size": doc.file_size,
|
||||
"content_type": doc.content_type,
|
||||
"processing_status": doc.processing_status,
|
||||
"embedding_status": doc.embedding_status,
|
||||
"summary_status": getattr(doc, 'summary_status', 'pending'),
|
||||
"created_at": doc.created_at,
|
||||
"updated_at": doc.updated_at,
|
||||
"metadata": doc.metadata,
|
||||
"chunk_count": doc.chunk_count
|
||||
}
|
||||
for doc in documents
|
||||
]
|
||||
|
||||
@router.get("/{document_id}/download")
|
||||
async def download_document(
|
||||
document_id: str,
|
||||
current_user: UserInDB = Depends(get_current_active_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Download a document"""
|
||||
document = await db.documents.find_one({"_id": ObjectId(document_id)})
|
||||
if not document:
|
||||
raise HTTPException(status_code=404, detail="Document not found")
|
||||
|
||||
doc = DocumentInDB(**document)
|
||||
|
||||
# Check if user has access to this document's index
|
||||
if current_user.role.value != "admin" and doc.index_id not in current_user.index_access:
|
||||
raise HTTPException(status_code=403, detail="Access denied to this document")
|
||||
|
||||
# Check if file exists
|
||||
if not os.path.exists(doc.file_path):
|
||||
raise HTTPException(status_code=404, detail="File not found on disk")
|
||||
|
||||
return FileResponse(
|
||||
path=doc.file_path,
|
||||
filename=doc.original_filename,
|
||||
media_type=doc.content_type
|
||||
)
|
||||
|
||||
@router.get("/{document_id}/text")
|
||||
async def get_document_text(
|
||||
document_id: str,
|
||||
current_user: UserInDB = Depends(get_current_active_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Get the parsed text content of a document"""
|
||||
print(f"🔍 DEBUG - Getting document text for ID: {document_id}")
|
||||
|
||||
document = await db.documents.find_one({"_id": ObjectId(document_id)})
|
||||
if not document:
|
||||
print(f"❌ DEBUG - Document not found: {document_id}")
|
||||
raise HTTPException(status_code=404, detail="Document not found")
|
||||
|
||||
print(f"✅ DEBUG - Document found: {document.get('original_filename', 'Unknown')}")
|
||||
doc = DocumentInDB(**document)
|
||||
|
||||
# Check if user has access to this document's index
|
||||
if current_user.role.value != "admin" and doc.index_id not in current_user.index_access:
|
||||
print(f"❌ DEBUG - Access denied to index: {doc.index_id}")
|
||||
raise HTTPException(status_code=403, detail="Access denied to this document")
|
||||
|
||||
# Check if document has been processed
|
||||
print(f"📊 DEBUG - Processing status: {doc.processing_status}")
|
||||
if doc.processing_status != "completed":
|
||||
raise HTTPException(status_code=400, detail="Document not yet processed")
|
||||
|
||||
# Check if parsed text exists
|
||||
parsed_text = getattr(doc, 'parsed_text', None)
|
||||
print(f"📝 DEBUG - Parsed text length: {len(parsed_text) if parsed_text else 0}")
|
||||
if not parsed_text:
|
||||
raise HTTPException(status_code=404, detail="Document text not available")
|
||||
|
||||
print(f"✅ DEBUG - Returning document text successfully")
|
||||
return {
|
||||
"success": True,
|
||||
"document_id": str(doc.id),
|
||||
"filename": doc.original_filename,
|
||||
"text": parsed_text,
|
||||
"text_length": len(parsed_text),
|
||||
"processing_status": doc.processing_status,
|
||||
"created_at": doc.created_at,
|
||||
"updated_at": doc.updated_at
|
||||
}
|
||||
|
||||
@router.get("/{document_id}", response_model=dict)
|
||||
async def get_document(
|
||||
document_id: str,
|
||||
current_user: UserInDB = Depends(get_current_active_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Get a specific document"""
|
||||
document = await db.documents.find_one({"_id": ObjectId(document_id)})
|
||||
if not document:
|
||||
raise HTTPException(status_code=404, detail="Document not found")
|
||||
|
||||
doc = DocumentInDB(**document)
|
||||
|
||||
# Check if user has access to this document's index
|
||||
if current_user.role.value != "admin" and doc.index_id not in current_user.index_access:
|
||||
raise HTTPException(status_code=403, detail="Access denied to this document")
|
||||
|
||||
return {
|
||||
"id": str(doc.id),
|
||||
"filename": doc.filename,
|
||||
"original_filename": doc.original_filename,
|
||||
"file_size": doc.file_size,
|
||||
"content_type": doc.content_type,
|
||||
"index_id": doc.index_id,
|
||||
"processing_status": doc.processing_status,
|
||||
"created_at": doc.created_at,
|
||||
"updated_at": doc.updated_at,
|
||||
"metadata": doc.metadata
|
||||
}
|
||||
|
||||
@router.delete("/{document_id}")
|
||||
async def delete_document(
|
||||
document_id: str,
|
||||
current_user: UserInDB = Depends(get_current_active_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Delete a document"""
|
||||
document = await db.documents.find_one({"_id": ObjectId(document_id)})
|
||||
if not document:
|
||||
raise HTTPException(status_code=404, detail="Document not found")
|
||||
|
||||
doc = DocumentInDB(**document)
|
||||
|
||||
# Only admins can delete documents
|
||||
if current_user.role.value != "admin":
|
||||
raise HTTPException(status_code=403, detail="Only administrators can delete documents")
|
||||
|
||||
# Delete the document
|
||||
success = await document_processor.delete_document(document_id, db)
|
||||
|
||||
if success:
|
||||
return {"message": "Document deleted successfully"}
|
||||
else:
|
||||
raise HTTPException(status_code=500, detail="Failed to delete document")
|
||||
|
||||
@router.get("/{document_id}/summary", response_model=ContractSummaryResponse)
|
||||
async def get_document_summary(
|
||||
document_id: str,
|
||||
current_user: UserInDB = Depends(get_current_active_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Get structured contract summary of a document"""
|
||||
document = await db.documents.find_one({"_id": ObjectId(document_id)})
|
||||
if not document:
|
||||
raise HTTPException(status_code=404, detail="Document not found")
|
||||
|
||||
doc = DocumentInDB(**document)
|
||||
|
||||
# Check if user has access to this document's index
|
||||
if current_user.role.value != "admin" and doc.index_id not in current_user.index_access:
|
||||
raise HTTPException(status_code=403, detail="Access denied to this document")
|
||||
|
||||
# Check if document has been processed
|
||||
if doc.processing_status != "completed":
|
||||
raise HTTPException(status_code=400, detail="Document not yet processed")
|
||||
|
||||
# Get summary status
|
||||
summary_status = getattr(doc, 'summary_status', 'pending')
|
||||
|
||||
# If summary is completed, return structured summary
|
||||
if summary_status == "completed" and hasattr(doc, 'contract_summary') and doc.contract_summary:
|
||||
from ...models.contract_summary import ContractSummary
|
||||
contract_summary = ContractSummary(**doc.contract_summary)
|
||||
|
||||
return ContractSummaryResponse(
|
||||
success=True,
|
||||
summary=contract_summary,
|
||||
status=summary_status,
|
||||
filename=doc.original_filename,
|
||||
created_at=getattr(doc, 'summary_created_at', doc.created_at).isoformat() if getattr(doc, 'summary_created_at', doc.created_at) else None,
|
||||
updated_at=doc.updated_at.isoformat() if doc.updated_at else None
|
||||
)
|
||||
|
||||
# If summary is processing, return status
|
||||
elif summary_status == "processing":
|
||||
return ContractSummaryResponse(
|
||||
success=False,
|
||||
status=summary_status,
|
||||
filename=doc.original_filename,
|
||||
error="Contract summary is currently being processed. Please check back in a few moments."
|
||||
)
|
||||
|
||||
# If summary failed, return error
|
||||
elif summary_status == "failed":
|
||||
error_msg = "Contract summary extraction failed."
|
||||
if hasattr(doc, 'metadata') and doc.metadata and 'summary_error' in doc.metadata:
|
||||
error_msg += f" Error: {doc.metadata['summary_error']}"
|
||||
|
||||
return ContractSummaryResponse(
|
||||
success=False,
|
||||
status=summary_status,
|
||||
filename=doc.original_filename,
|
||||
error=error_msg
|
||||
)
|
||||
|
||||
# If summary is pending, return pending status
|
||||
else:
|
||||
return ContractSummaryResponse(
|
||||
success=False,
|
||||
status="pending",
|
||||
filename=doc.original_filename,
|
||||
error="Contract summary processing has not started yet. Please try again later."
|
||||
)
|
||||
|
||||
@router.post("/{document_id}/summary/reprocess")
|
||||
async def reprocess_document_summary(
|
||||
document_id: str,
|
||||
current_user: UserInDB = Depends(get_current_active_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Reprocess contract summary for a document"""
|
||||
document = await db.documents.find_one({"_id": ObjectId(document_id)})
|
||||
if not document:
|
||||
raise HTTPException(status_code=404, detail="Document not found")
|
||||
|
||||
doc = DocumentInDB(**document)
|
||||
|
||||
# Check if user has access to this document's index
|
||||
if current_user.role.value != "admin" and doc.index_id not in current_user.index_access:
|
||||
raise HTTPException(status_code=403, detail="Access denied to this document")
|
||||
|
||||
# Check if document has been processed
|
||||
if doc.processing_status != "completed" or not doc.parsed_text:
|
||||
raise HTTPException(status_code=400, detail="Document not yet processed or text not available")
|
||||
|
||||
# Reset summary status and trigger reprocessing
|
||||
from ...services.llama_processor import llama_processor
|
||||
try:
|
||||
# Reset summary status
|
||||
await db.documents.update_one(
|
||||
{"_id": ObjectId(document_id)},
|
||||
{"$set": {
|
||||
"summary_status": "pending",
|
||||
"contract_summary": None,
|
||||
"summary_created_at": None,
|
||||
"updated_at": datetime.utcnow()
|
||||
}}
|
||||
)
|
||||
|
||||
# Trigger summary extraction asynchronously
|
||||
asyncio.create_task(llama_processor._extract_contract_summary(
|
||||
document_id, doc.parsed_text, doc.original_filename, db
|
||||
))
|
||||
|
||||
return {
|
||||
"message": "Contract summary reprocessing started",
|
||||
"document_id": document_id,
|
||||
"filename": doc.original_filename,
|
||||
"status": "processing"
|
||||
}
|
||||
except Exception as e:
|
||||
raise HTTPException(status_code=500, detail=f"Error starting summary reprocessing: {str(e)}")
|
||||
265
backend/app/api/v1/indices.py
Normal file
265
backend/app/api/v1/indices.py
Normal file
|
|
@ -0,0 +1,265 @@
|
|||
from fastapi import APIRouter, Depends, HTTPException
|
||||
from motor.motor_asyncio import AsyncIOMotorDatabase
|
||||
from typing import List
|
||||
from bson import ObjectId
|
||||
from datetime import datetime
|
||||
import uuid
|
||||
from urllib.parse import unquote
|
||||
|
||||
from ...config.database import get_database
|
||||
from ...core.auth import get_current_active_user, get_current_admin_user
|
||||
from ...models.user import UserInDB
|
||||
from ...models.index import Index, IndexCreate, IndexInDB
|
||||
from ...services.rag_service import rag_service
|
||||
from ...services.document_processor import document_processor
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
@router.post("/create", response_model=dict)
|
||||
async def create_index(
|
||||
index_data: IndexCreate,
|
||||
current_user: UserInDB = Depends(get_current_active_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Create a new document index"""
|
||||
# Generate unique index ID
|
||||
index_id = f"{index_data.name.lower().replace(' ', '-')}-{datetime.now().strftime('%Y-%m-%d')}-{str(uuid.uuid4())[:8]}"
|
||||
|
||||
# Create index record
|
||||
index_dict = index_data.dict()
|
||||
index_dict["index_id"] = index_id
|
||||
index_dict["status"] = "active"
|
||||
index_dict["document_count"] = 0
|
||||
index_dict["settings"] = {}
|
||||
index_dict["created_at"] = datetime.utcnow()
|
||||
index_dict["updated_at"] = datetime.utcnow()
|
||||
|
||||
# Insert into database
|
||||
result = await db.indices.insert_one(index_dict)
|
||||
index_dict["_id"] = result.inserted_id
|
||||
|
||||
# Grant access to the creator
|
||||
await db.users.update_one(
|
||||
{"_id": current_user.id},
|
||||
{"$addToSet": {"index_access": index_id}}
|
||||
)
|
||||
|
||||
return {
|
||||
"message": "Index created successfully",
|
||||
"index_id": index_id,
|
||||
"name": index_data.name,
|
||||
"id": str(result.inserted_id)
|
||||
}
|
||||
|
||||
@router.get("", response_model=List[dict])
|
||||
async def get_user_indices(
|
||||
current_user: UserInDB = Depends(get_current_active_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Get all indices accessible to the current user"""
|
||||
# Ensure index_access is a list and not None
|
||||
user_index_access = current_user.index_access if current_user.index_access else []
|
||||
|
||||
if current_user.role.value == "admin":
|
||||
# Admin can see all indices
|
||||
cursor = db.indices.find({"status": "active"})
|
||||
else:
|
||||
# Regular users see only their accessible indices
|
||||
# If user has no index access, they should see no indices
|
||||
if not user_index_access:
|
||||
return []
|
||||
|
||||
cursor = db.indices.find({
|
||||
"index_id": {"$in": user_index_access},
|
||||
"status": "active"
|
||||
})
|
||||
|
||||
indices = []
|
||||
async for index in cursor:
|
||||
index_obj = IndexInDB(**index)
|
||||
|
||||
# Double-check access control for non-admin users
|
||||
if current_user.role.value != "admin":
|
||||
if index_obj.index_id not in user_index_access:
|
||||
continue # Skip this index if user doesn't have access
|
||||
|
||||
# Get real-time document count instead of stored value
|
||||
real_document_count = await db.documents.count_documents({"index_id": index_obj.index_id})
|
||||
|
||||
indices.append({
|
||||
"id": str(index_obj.id),
|
||||
"index_id": index_obj.index_id,
|
||||
"name": index_obj.name,
|
||||
"description": index_obj.description,
|
||||
"document_count": real_document_count,
|
||||
"created_at": index_obj.created_at,
|
||||
"updated_at": index_obj.updated_at,
|
||||
"status": index_obj.status
|
||||
})
|
||||
|
||||
return indices
|
||||
|
||||
@router.get("/{index_id}", response_model=dict)
|
||||
async def get_index_details(
|
||||
index_id: str,
|
||||
current_user: UserInDB = Depends(get_current_active_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Get details of a specific index"""
|
||||
# Decode URL-encoded index_id
|
||||
decoded_index_id = unquote(index_id)
|
||||
|
||||
# Check if user has access to this index
|
||||
if current_user.role.value != "admin" and decoded_index_id not in current_user.index_access:
|
||||
raise HTTPException(status_code=403, detail="Access denied to this index")
|
||||
|
||||
index = await db.indices.find_one({"index_id": decoded_index_id})
|
||||
if not index:
|
||||
raise HTTPException(status_code=404, detail="Index not found")
|
||||
|
||||
index_obj = IndexInDB(**index)
|
||||
|
||||
# Get document count
|
||||
document_count = await db.documents.count_documents({"index_id": decoded_index_id})
|
||||
|
||||
# Get documents
|
||||
documents = await document_processor.get_documents_by_index(decoded_index_id, db)
|
||||
|
||||
return {
|
||||
"id": str(index_obj.id),
|
||||
"index_id": index_obj.index_id,
|
||||
"name": index_obj.name,
|
||||
"description": index_obj.description,
|
||||
"document_count": document_count,
|
||||
"created_at": index_obj.created_at,
|
||||
"updated_at": index_obj.updated_at,
|
||||
"status": index_obj.status,
|
||||
"settings": index_obj.settings,
|
||||
"documents": [
|
||||
{
|
||||
"id": str(doc.id),
|
||||
"filename": doc.filename,
|
||||
"original_filename": doc.original_filename,
|
||||
"processing_status": doc.processing_status,
|
||||
"created_at": doc.created_at
|
||||
}
|
||||
for doc in documents
|
||||
]
|
||||
}
|
||||
|
||||
@router.post("/{index_id}/rebuild")
|
||||
async def rebuild_index(
|
||||
index_id: str,
|
||||
current_user: UserInDB = Depends(get_current_active_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Rebuild the vector index from all documents"""
|
||||
# Decode URL-encoded index_id
|
||||
decoded_index_id = unquote(index_id)
|
||||
|
||||
# Check if user has access to this index
|
||||
if current_user.role.value != "admin" and decoded_index_id not in current_user.index_access:
|
||||
raise HTTPException(status_code=403, detail="Access denied to this index")
|
||||
|
||||
# Get all documents for this index
|
||||
documents = await document_processor.get_documents_by_index(decoded_index_id, db)
|
||||
|
||||
if not documents:
|
||||
raise HTTPException(status_code=400, detail="No documents found for this index")
|
||||
|
||||
# NOTE: Index rebuilding is now handled by reprocessing documents through LlamaProcessor
|
||||
# Clear existing vectors and reprocess all documents in this index
|
||||
import asyncio
|
||||
from ...services.llama_processor import llama_processor
|
||||
|
||||
reprocessed_count = 0
|
||||
for doc in documents:
|
||||
try:
|
||||
print(f"Rebuilding document {doc.id}: {doc.original_filename}")
|
||||
|
||||
# Step 1: Clear existing vectors from ChromaDB
|
||||
if doc.vector_ids:
|
||||
print(f" - Clearing {len(doc.vector_ids)} existing vectors")
|
||||
await llama_processor.delete_document_embeddings(
|
||||
str(doc.id),
|
||||
decoded_index_id
|
||||
)
|
||||
|
||||
# Step 2: Clear document metadata and reset status
|
||||
await db.documents.update_one(
|
||||
{"_id": doc.id},
|
||||
{
|
||||
"$set": {
|
||||
"processing_status": "pending",
|
||||
"embedding_status": "pending",
|
||||
"summary_status": "pending",
|
||||
"updated_at": datetime.utcnow()
|
||||
},
|
||||
"$unset": {
|
||||
"parsed_text": "",
|
||||
"text_chunks": "",
|
||||
"chunk_count": "",
|
||||
"vector_ids": "",
|
||||
"contract_summary": "",
|
||||
"summary_created_at": ""
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
# Step 3: Start reprocessing
|
||||
asyncio.create_task(llama_processor._process_document_async(doc, db))
|
||||
reprocessed_count += 1
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error queueing document {doc.id} for reprocessing: {e}")
|
||||
|
||||
# Update index timestamp
|
||||
await db.indices.update_one(
|
||||
{"index_id": decoded_index_id},
|
||||
{"$set": {"updated_at": datetime.utcnow()}}
|
||||
)
|
||||
|
||||
return {
|
||||
"message": f"Index rebuild started - {reprocessed_count} documents queued for reprocessing",
|
||||
"document_count": reprocessed_count
|
||||
}
|
||||
|
||||
@router.delete("/{index_id}")
|
||||
async def delete_index(
|
||||
index_id: str,
|
||||
current_user: UserInDB = Depends(get_current_admin_user),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
):
|
||||
"""Delete an index (admin only)"""
|
||||
# Decode URL-encoded index_id
|
||||
decoded_index_id = unquote(index_id)
|
||||
|
||||
index = await db.indices.find_one({"index_id": decoded_index_id})
|
||||
if not index:
|
||||
raise HTTPException(status_code=404, detail="Index not found")
|
||||
|
||||
# Delete vector index with complete cleanup
|
||||
deletion_result = await rag_service.delete_index_complete(decoded_index_id)
|
||||
if not deletion_result["success"]:
|
||||
print(f"Warning during index deletion: {deletion_result['message']}")
|
||||
|
||||
# Delete all documents in this index
|
||||
documents = await document_processor.get_documents_by_index(decoded_index_id, db)
|
||||
for doc in documents:
|
||||
await document_processor.delete_document(str(doc.id), db)
|
||||
|
||||
# Note: Cache clearing removed - caching is disabled for data freshness
|
||||
|
||||
# Mark index as deleted
|
||||
await db.indices.update_one(
|
||||
{"index_id": decoded_index_id},
|
||||
{"$set": {"status": "deleted", "updated_at": datetime.utcnow()}}
|
||||
)
|
||||
|
||||
# Remove index access from all users
|
||||
await db.users.update_many(
|
||||
{"index_access": decoded_index_id},
|
||||
{"$pull": {"index_access": decoded_index_id}}
|
||||
)
|
||||
|
||||
return {"message": "Index deleted successfully"}
|
||||
12
backend/app/config/__init__.py
Normal file
12
backend/app/config/__init__.py
Normal file
|
|
@ -0,0 +1,12 @@
|
|||
from .settings import settings
|
||||
from .database import get_database, get_redis, connect_to_mongo, close_mongo_connection, connect_to_redis, close_redis_connection
|
||||
|
||||
__all__ = [
|
||||
"settings",
|
||||
"get_database",
|
||||
"get_redis",
|
||||
"connect_to_mongo",
|
||||
"close_mongo_connection",
|
||||
"connect_to_redis",
|
||||
"close_redis_connection"
|
||||
]
|
||||
56
backend/app/config/database.py
Normal file
56
backend/app/config/database.py
Normal file
|
|
@ -0,0 +1,56 @@
|
|||
from motor.motor_asyncio import AsyncIOMotorClient
|
||||
from typing import Optional
|
||||
import redis.asyncio as redis
|
||||
from .settings import settings
|
||||
|
||||
class Database:
|
||||
client: Optional[AsyncIOMotorClient] = None
|
||||
database = None
|
||||
redis_client: Optional[redis.Redis] = None
|
||||
|
||||
db = Database()
|
||||
|
||||
async def connect_to_mongo():
|
||||
"""Create database connection"""
|
||||
db.client = AsyncIOMotorClient(settings.mongodb_url)
|
||||
db.database = db.client[settings.database_name]
|
||||
|
||||
# Test connection
|
||||
try:
|
||||
await db.client.admin.command('ping')
|
||||
print("Connected to MongoDB successfully!")
|
||||
except Exception as e:
|
||||
print(f"Error connecting to MongoDB: {e}")
|
||||
raise
|
||||
|
||||
async def close_mongo_connection():
|
||||
"""Close database connection"""
|
||||
if db.client:
|
||||
db.client.close()
|
||||
print("Disconnected from MongoDB")
|
||||
|
||||
async def connect_to_redis():
|
||||
"""Create Redis connection"""
|
||||
if settings.cache_enabled:
|
||||
try:
|
||||
db.redis_client = redis.from_url(settings.redis_url)
|
||||
await db.redis_client.ping()
|
||||
print("Connected to Redis successfully!")
|
||||
except Exception as e:
|
||||
print(f"Error connecting to Redis: {e}")
|
||||
print("Continuing without Redis cache...")
|
||||
db.redis_client = None
|
||||
|
||||
async def close_redis_connection():
|
||||
"""Close Redis connection"""
|
||||
if db.redis_client:
|
||||
await db.redis_client.close()
|
||||
print("Disconnected from Redis")
|
||||
|
||||
def get_database():
|
||||
"""Get database instance"""
|
||||
return db.database
|
||||
|
||||
def get_redis():
|
||||
"""Get Redis instance"""
|
||||
return db.redis_client
|
||||
53
backend/app/config/settings.py
Normal file
53
backend/app/config/settings.py
Normal file
|
|
@ -0,0 +1,53 @@
|
|||
from pydantic_settings import BaseSettings
|
||||
from typing import List, Optional
|
||||
import os
|
||||
|
||||
class Settings(BaseSettings):
|
||||
# Database
|
||||
mongodb_url: str = "mongodb://localhost:27017"
|
||||
database_name: str = "contract_analysis"
|
||||
|
||||
# Redis
|
||||
redis_url: str = "redis://localhost:6379"
|
||||
|
||||
# Authentication
|
||||
jwt_secret_key: str = "your-super-secret-jwt-key-change-this-in-production"
|
||||
jwt_algorithm: str = "HS256"
|
||||
jwt_expire_minutes: int = 180
|
||||
|
||||
# Azure AD / SSO Configuration
|
||||
azure_client_id: Optional[str] = None
|
||||
azure_tenant_id: Optional[str] = None
|
||||
azure_redirect_uri: Optional[str] = None
|
||||
azure_authority: Optional[str] = None
|
||||
sso_enabled: bool = False
|
||||
allow_local_admin: bool = True
|
||||
|
||||
# OpenAI
|
||||
openai_api_key: str
|
||||
llamaparse_api_key: str # Required for document processing
|
||||
|
||||
# Application
|
||||
debug: bool = True
|
||||
cors_origins: List[str] = ["http://localhost:3000", "http://localhost:3002", "https://ai-sandbox.oliver.solutions", "*"]
|
||||
upload_dir: str = "./uploads"
|
||||
indices_dir: str = "./indices"
|
||||
|
||||
# Document processing limits
|
||||
max_document_chars: int = 1000000 # 1 million characters for contract summaries
|
||||
max_summary_chars: int = 100000 # 100k characters for simple summaries
|
||||
|
||||
# Cache - DISABLED for data freshness and debugging
|
||||
cache_enabled: bool = False
|
||||
cache_ttl: int = 3600
|
||||
|
||||
class Config:
|
||||
env_file = ".env"
|
||||
case_sensitive = False
|
||||
|
||||
# Create settings instance
|
||||
settings = Settings()
|
||||
|
||||
# Ensure directories exist
|
||||
os.makedirs(settings.upload_dir, exist_ok=True)
|
||||
os.makedirs(settings.indices_dir, exist_ok=True)
|
||||
15
backend/app/core/__init__.py
Normal file
15
backend/app/core/__init__.py
Normal file
|
|
@ -0,0 +1,15 @@
|
|||
from .auth import get_current_user, get_current_active_user, get_current_admin_user, has_index_access, require_index_access
|
||||
from .security import verify_password, get_password_hash, create_access_token, verify_token
|
||||
# Cache import removed - caching disabled for data freshness
|
||||
|
||||
__all__ = [
|
||||
"get_current_user",
|
||||
"get_current_active_user",
|
||||
"get_current_admin_user",
|
||||
"has_index_access",
|
||||
"require_index_access",
|
||||
"verify_password",
|
||||
"get_password_hash",
|
||||
"create_access_token",
|
||||
"verify_token"
|
||||
]
|
||||
81
backend/app/core/auth.py
Normal file
81
backend/app/core/auth.py
Normal file
|
|
@ -0,0 +1,81 @@
|
|||
from fastapi import Depends, HTTPException, status
|
||||
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
|
||||
from typing import Optional
|
||||
from motor.motor_asyncio import AsyncIOMotorDatabase
|
||||
from bson import ObjectId
|
||||
from .security import verify_token
|
||||
from ..config.database import get_database
|
||||
from ..models.user import UserInDB, UserRole
|
||||
|
||||
security = HTTPBearer()
|
||||
|
||||
async def get_current_user(
|
||||
credentials: HTTPAuthorizationCredentials = Depends(security),
|
||||
db: AsyncIOMotorDatabase = Depends(get_database)
|
||||
) -> UserInDB:
|
||||
"""Get current authenticated user"""
|
||||
token = credentials.credentials
|
||||
payload = verify_token(token)
|
||||
|
||||
user_id = payload.get("sub")
|
||||
if user_id is None:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="Could not validate credentials",
|
||||
headers={"WWW-Authenticate": "Bearer"},
|
||||
)
|
||||
|
||||
user = await db.users.find_one({"_id": ObjectId(user_id)})
|
||||
if user is None:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="User not found",
|
||||
headers={"WWW-Authenticate": "Bearer"},
|
||||
)
|
||||
|
||||
return UserInDB(**user)
|
||||
|
||||
async def get_current_active_user(
|
||||
current_user: UserInDB = Depends(get_current_user)
|
||||
) -> UserInDB:
|
||||
"""Get current active user"""
|
||||
if not current_user.is_active:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
detail="Inactive user"
|
||||
)
|
||||
return current_user
|
||||
|
||||
async def get_current_admin_user(
|
||||
current_user: UserInDB = Depends(get_current_active_user)
|
||||
) -> UserInDB:
|
||||
"""Get current admin user"""
|
||||
if current_user.role != UserRole.ADMIN:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_403_FORBIDDEN,
|
||||
detail="Not enough permissions"
|
||||
)
|
||||
return current_user
|
||||
|
||||
async def has_index_access(
|
||||
index_id: str,
|
||||
current_user: UserInDB = Depends(get_current_active_user)
|
||||
) -> bool:
|
||||
"""Check if user has access to specific index"""
|
||||
# Admin users have access to all indices
|
||||
if current_user.role == UserRole.ADMIN:
|
||||
return True
|
||||
|
||||
# Check if user has explicit access to this index
|
||||
return index_id in current_user.index_access
|
||||
|
||||
def require_index_access(index_id: str):
|
||||
"""Dependency to require index access"""
|
||||
async def check_access(current_user: UserInDB = Depends(get_current_active_user)):
|
||||
if not await has_index_access(index_id, current_user):
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_403_FORBIDDEN,
|
||||
detail="Access denied to this index"
|
||||
)
|
||||
return current_user
|
||||
return check_access
|
||||
160
backend/app/core/cache.py
Normal file
160
backend/app/core/cache.py
Normal file
|
|
@ -0,0 +1,160 @@
|
|||
import json
|
||||
import hashlib
|
||||
from typing import Optional, Any
|
||||
from ..config.database import get_redis
|
||||
from ..config.settings import settings
|
||||
|
||||
class CacheService:
|
||||
def __init__(self):
|
||||
self.redis = None
|
||||
self.enabled = settings.cache_enabled
|
||||
|
||||
async def get_redis(self):
|
||||
"""Get Redis client"""
|
||||
if not self.redis:
|
||||
self.redis = get_redis()
|
||||
return self.redis
|
||||
|
||||
def _generate_key(self, prefix: str, *args) -> str:
|
||||
"""Generate cache key"""
|
||||
key_data = f"{prefix}:{':'.join(str(arg) for arg in args)}"
|
||||
return hashlib.md5(key_data.encode()).hexdigest()
|
||||
|
||||
async def get(self, key: str) -> Optional[Any]:
|
||||
"""Get value from cache"""
|
||||
if not self.enabled:
|
||||
return None
|
||||
|
||||
redis_client = await self.get_redis()
|
||||
if not redis_client:
|
||||
return None
|
||||
|
||||
try:
|
||||
cached_data = await redis_client.get(key)
|
||||
if cached_data:
|
||||
return json.loads(cached_data)
|
||||
except Exception as e:
|
||||
print(f"Cache get error: {e}")
|
||||
|
||||
return None
|
||||
|
||||
async def set(self, key: str, value: Any, ttl: Optional[int] = None) -> bool:
|
||||
"""Set value in cache"""
|
||||
if not self.enabled:
|
||||
return False
|
||||
|
||||
redis_client = await self.get_redis()
|
||||
if not redis_client:
|
||||
return False
|
||||
|
||||
try:
|
||||
serialized_value = json.dumps(value, default=str)
|
||||
ttl = ttl or settings.cache_ttl
|
||||
await redis_client.setex(key, ttl, serialized_value)
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f"Cache set error: {e}")
|
||||
return False
|
||||
|
||||
async def delete(self, key: str) -> bool:
|
||||
"""Delete value from cache"""
|
||||
if not self.enabled:
|
||||
return False
|
||||
|
||||
redis_client = await self.get_redis()
|
||||
if not redis_client:
|
||||
return False
|
||||
|
||||
try:
|
||||
await redis_client.delete(key)
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f"Cache delete error: {e}")
|
||||
return False
|
||||
|
||||
async def clear_pattern(self, pattern: str) -> bool:
|
||||
"""Clear cache entries matching pattern"""
|
||||
if not self.enabled:
|
||||
return False
|
||||
|
||||
redis_client = await self.get_redis()
|
||||
if not redis_client:
|
||||
return False
|
||||
|
||||
try:
|
||||
keys = await redis_client.keys(pattern)
|
||||
if keys:
|
||||
await redis_client.delete(*keys)
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f"Cache clear pattern error: {e}")
|
||||
return False
|
||||
|
||||
def get_chat_cache_key(self, query: str, index_id: str) -> str:
|
||||
"""Generate cache key for chat responses"""
|
||||
return self._generate_key("chat", query, index_id)
|
||||
|
||||
def get_document_cache_key(self, index_id: str) -> str:
|
||||
"""Generate cache key for document lists"""
|
||||
return self._generate_key("documents", index_id)
|
||||
|
||||
def get_index_cache_key(self, user_id: str) -> str:
|
||||
"""Generate cache key for user indices"""
|
||||
return self._generate_key("indices", user_id)
|
||||
|
||||
def get_all_cache_keys_for_index(self, index_id: str) -> list[str]:
|
||||
"""Get all cache keys that should be cleared for a specific index"""
|
||||
return [
|
||||
self.get_document_cache_key(index_id),
|
||||
# Chat cache keys are query-specific, so we'll use pattern matching for those
|
||||
]
|
||||
|
||||
async def clear_index_cache(self, index_id: str) -> bool:
|
||||
"""Clear only cache entries for specific index"""
|
||||
if not self.enabled:
|
||||
return False
|
||||
|
||||
try:
|
||||
# Clear document cache for this index
|
||||
document_key = self.get_document_cache_key(index_id)
|
||||
await self.delete(document_key)
|
||||
|
||||
# Clear chat cache entries for this index using targeted pattern
|
||||
# Pattern: chat:*:index_id (since chat keys are "chat:query:index_id")
|
||||
chat_pattern = f"*:{index_id}"
|
||||
redis_client = await self.get_redis()
|
||||
if redis_client:
|
||||
# Get all keys and filter for chat keys with this index_id
|
||||
all_keys = await redis_client.keys("*")
|
||||
chat_keys_to_delete = []
|
||||
|
||||
for key in all_keys:
|
||||
key_str = key.decode() if isinstance(key, bytes) else str(key)
|
||||
# Check if this is a chat cache key for our index
|
||||
if key_str.endswith(f":{index_id}") and "chat" in key_str:
|
||||
chat_keys_to_delete.append(key)
|
||||
|
||||
if chat_keys_to_delete:
|
||||
await redis_client.delete(*chat_keys_to_delete)
|
||||
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error clearing cache for index {index_id}: {e}")
|
||||
return False
|
||||
|
||||
async def clear_user_index_cache(self, user_id: str) -> bool:
|
||||
"""Clear index cache for a specific user"""
|
||||
if not self.enabled:
|
||||
return False
|
||||
|
||||
try:
|
||||
user_index_key = self.get_index_cache_key(user_id)
|
||||
await self.delete(user_index_key)
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f"Error clearing user index cache for {user_id}: {e}")
|
||||
return False
|
||||
|
||||
# Global cache instance
|
||||
cache = CacheService()
|
||||
54
backend/app/core/chroma_client.py
Normal file
54
backend/app/core/chroma_client.py
Normal file
|
|
@ -0,0 +1,54 @@
|
|||
"""
|
||||
Shared ChromaDB client singleton to prevent initialization conflicts
|
||||
"""
|
||||
import chromadb
|
||||
from chromadb.config import Settings as ChromaSettings
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
class ChromaDBSingleton:
|
||||
"""Singleton ChromaDB client to prevent 'different settings' errors"""
|
||||
_instance: Optional['ChromaDBSingleton'] = None
|
||||
_client: Optional[chromadb.PersistentClient] = None
|
||||
|
||||
def __new__(cls):
|
||||
if cls._instance is None:
|
||||
cls._instance = super().__new__(cls)
|
||||
return cls._instance
|
||||
|
||||
def get_client(self, chroma_db_path: str) -> chromadb.PersistentClient:
|
||||
"""Get or create ChromaDB client with consistent settings"""
|
||||
if self._client is None:
|
||||
try:
|
||||
# Use consistent settings across all services
|
||||
settings = ChromaSettings(anonymized_telemetry=False)
|
||||
self._client = chromadb.PersistentClient(
|
||||
path=chroma_db_path,
|
||||
settings=settings
|
||||
)
|
||||
print(f"Created shared ChromaDB client at {chroma_db_path}")
|
||||
except Exception as e:
|
||||
print(f"Error creating shared ChromaDB client: {e}")
|
||||
# Try with minimal settings if the above fails
|
||||
try:
|
||||
self._client = chromadb.PersistentClient(path=chroma_db_path)
|
||||
print(f"Created ChromaDB client with default settings at {chroma_db_path}")
|
||||
except Exception as e2:
|
||||
print(f"Failed to create ChromaDB client: {e2}")
|
||||
raise e2
|
||||
return self._client
|
||||
|
||||
@classmethod
|
||||
def reset(cls):
|
||||
"""Reset the singleton (useful for testing or reinitialization)"""
|
||||
if cls._instance and cls._instance._client:
|
||||
# Close existing client if possible
|
||||
try:
|
||||
cls._instance._client.reset()
|
||||
except:
|
||||
pass
|
||||
cls._instance = None
|
||||
cls._client = None
|
||||
|
||||
# Global singleton instance
|
||||
chroma_singleton = ChromaDBSingleton()
|
||||
41
backend/app/core/security.py
Normal file
41
backend/app/core/security.py
Normal file
|
|
@ -0,0 +1,41 @@
|
|||
from datetime import datetime, timedelta
|
||||
from typing import Optional, Union
|
||||
from jose import JWTError, jwt
|
||||
from passlib.context import CryptContext
|
||||
from fastapi import HTTPException, status
|
||||
from ..config.settings import settings
|
||||
|
||||
# Password hashing
|
||||
pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
|
||||
|
||||
def verify_password(plain_password: str, hashed_password: str) -> bool:
|
||||
"""Verify a password against its hash"""
|
||||
return pwd_context.verify(plain_password, hashed_password)
|
||||
|
||||
def get_password_hash(password: str) -> str:
|
||||
"""Hash a password"""
|
||||
return pwd_context.hash(password)
|
||||
|
||||
def create_access_token(data: dict, expires_delta: Optional[timedelta] = None) -> str:
|
||||
"""Create a JWT access token"""
|
||||
to_encode = data.copy()
|
||||
if expires_delta:
|
||||
expire = datetime.utcnow() + expires_delta
|
||||
else:
|
||||
expire = datetime.utcnow() + timedelta(minutes=settings.jwt_expire_minutes)
|
||||
|
||||
to_encode.update({"exp": expire})
|
||||
encoded_jwt = jwt.encode(to_encode, settings.jwt_secret_key, algorithm=settings.jwt_algorithm)
|
||||
return encoded_jwt
|
||||
|
||||
def verify_token(token: str) -> dict:
|
||||
"""Verify and decode a JWT token"""
|
||||
try:
|
||||
payload = jwt.decode(token, settings.jwt_secret_key, algorithms=[settings.jwt_algorithm])
|
||||
return payload
|
||||
except JWTError:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="Could not validate credentials",
|
||||
headers={"WWW-Authenticate": "Bearer"},
|
||||
)
|
||||
86
backend/app/main.py
Normal file
86
backend/app/main.py
Normal file
|
|
@ -0,0 +1,86 @@
|
|||
from fastapi import FastAPI, Request #type: ignore
|
||||
from fastapi.middleware.cors import CORSMiddleware #type: ignore
|
||||
from fastapi.responses import JSONResponse #type: ignore
|
||||
import time
|
||||
import uvicorn #type: ignore
|
||||
|
||||
from .config import settings, connect_to_mongo, close_mongo_connection, connect_to_redis, close_redis_connection
|
||||
from .api.v1 import auth, documents, indices, chat, admin
|
||||
|
||||
# Create FastAPI app
|
||||
app = FastAPI(
|
||||
title="Contract Analysis API",
|
||||
description="FastAPI backend for intelligent contract analysis and document Q&A",
|
||||
version="2.0.0",
|
||||
docs_url="/docs" if settings.debug else None,
|
||||
redoc_url="/redoc" if settings.debug else None,
|
||||
)
|
||||
|
||||
# Add CORS middleware
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=settings.cors_origins,
|
||||
allow_credentials=True,
|
||||
allow_methods=["*"],
|
||||
allow_headers=["*"],
|
||||
)
|
||||
|
||||
# Add request timing middleware
|
||||
@app.middleware("http")
|
||||
async def add_process_time_header(request: Request, call_next):
|
||||
start_time = time.time()
|
||||
response = await call_next(request)
|
||||
process_time = time.time() - start_time
|
||||
response.headers["X-Process-Time"] = str(process_time)
|
||||
return response
|
||||
|
||||
# Exception handlers
|
||||
@app.exception_handler(Exception)
|
||||
async def global_exception_handler(request: Request, exc: Exception):
|
||||
if settings.debug:
|
||||
return JSONResponse(
|
||||
status_code=500,
|
||||
content={"detail": str(exc), "type": type(exc).__name__}
|
||||
)
|
||||
return JSONResponse(
|
||||
status_code=500,
|
||||
content={"detail": "Internal server error"}
|
||||
)
|
||||
|
||||
# Startup and shutdown events
|
||||
@app.on_event("startup")
|
||||
async def startup_event():
|
||||
await connect_to_mongo()
|
||||
await connect_to_redis()
|
||||
print("✅ Application startup complete")
|
||||
|
||||
@app.on_event("shutdown")
|
||||
async def shutdown_event():
|
||||
await close_mongo_connection()
|
||||
await close_redis_connection()
|
||||
print("✅ Application shutdown complete")
|
||||
|
||||
# Include routers
|
||||
app.include_router(auth.router, prefix="/api/v1/auth", tags=["authentication"])
|
||||
app.include_router(documents.router, prefix="/api/v1/documents", tags=["documents"])
|
||||
app.include_router(indices.router, prefix="/api/v1/indices", tags=["indices"])
|
||||
app.include_router(chat.router, prefix="/api/v1/chat", tags=["chat"])
|
||||
app.include_router(admin.router, prefix="/api/v1/admin", tags=["admin"])
|
||||
|
||||
# Health check endpoint
|
||||
@app.get("/health")
|
||||
async def health_check():
|
||||
return {"status": "healthy", "version": "2.0.0"}
|
||||
|
||||
# Root endpoint
|
||||
@app.get("/")
|
||||
async def root():
|
||||
return {"message": "Contract Analysis API", "version": "2.0.0"}
|
||||
|
||||
if __name__ == "__main__":
|
||||
uvicorn.run(
|
||||
"app.main:app",
|
||||
host="0.0.0.0",
|
||||
port=8000,
|
||||
reload=settings.debug
|
||||
)
|
||||
25
backend/app/models/__init__.py
Normal file
25
backend/app/models/__init__.py
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
from .user import User, UserCreate, UserUpdate, UserInDB, UserRole
|
||||
from .document import Document, DocumentCreate, DocumentUpdate, DocumentInDB
|
||||
from .index import Index, IndexCreate, IndexUpdate, IndexInDB
|
||||
from .chat import ChatMessage, ChatMessageCreate, ChatMessageInDB, ChatQuery, ChatResponse
|
||||
|
||||
__all__ = [
|
||||
"User",
|
||||
"UserCreate",
|
||||
"UserUpdate",
|
||||
"UserInDB",
|
||||
"UserRole",
|
||||
"Document",
|
||||
"DocumentCreate",
|
||||
"DocumentUpdate",
|
||||
"DocumentInDB",
|
||||
"Index",
|
||||
"IndexCreate",
|
||||
"IndexUpdate",
|
||||
"IndexInDB",
|
||||
"ChatMessage",
|
||||
"ChatMessageCreate",
|
||||
"ChatMessageInDB",
|
||||
"ChatQuery",
|
||||
"ChatResponse"
|
||||
]
|
||||
53
backend/app/models/chat.py
Normal file
53
backend/app/models/chat.py
Normal file
|
|
@ -0,0 +1,53 @@
|
|||
from pydantic import BaseModel, Field
|
||||
from typing import Optional, List, Dict, Any
|
||||
from datetime import datetime
|
||||
from bson import ObjectId
|
||||
from .user import PyObjectId
|
||||
|
||||
class ChatMessageBase(BaseModel):
|
||||
user_id: PyObjectId
|
||||
index_id: str
|
||||
query: str
|
||||
response: str
|
||||
created_at: Optional[datetime] = None
|
||||
updated_at: Optional[datetime] = None
|
||||
deleted_by_user: bool = False
|
||||
|
||||
class ChatMessageCreate(ChatMessageBase):
|
||||
pass
|
||||
|
||||
class ChatMessageInDB(ChatMessageBase):
|
||||
id: PyObjectId = Field(default_factory=PyObjectId, alias="_id")
|
||||
debug_info: Dict[str, Any] = Field(default_factory=dict)
|
||||
response_time: float = 0.0
|
||||
cached: bool = False
|
||||
sources: List[Dict[str, Any]] = Field(default_factory=list)
|
||||
context_used: Optional[str] = None
|
||||
|
||||
class Config:
|
||||
populate_by_name = True
|
||||
arbitrary_types_allowed = True
|
||||
json_encoders = {ObjectId: str}
|
||||
|
||||
class ChatMessage(ChatMessageBase):
|
||||
id: PyObjectId = Field(default_factory=PyObjectId, alias="_id")
|
||||
debug_info: Dict[str, Any] = Field(default_factory=dict)
|
||||
response_time: float = 0.0
|
||||
cached: bool = False
|
||||
sources: List[Dict[str, Any]] = Field(default_factory=list)
|
||||
context_used: Optional[str] = None
|
||||
|
||||
class Config:
|
||||
populate_by_name = True
|
||||
arbitrary_types_allowed = True
|
||||
json_encoders = {ObjectId: str}
|
||||
|
||||
class ChatQuery(BaseModel):
|
||||
query: str
|
||||
index_id: str
|
||||
|
||||
class ChatResponse(BaseModel):
|
||||
response: str
|
||||
debug_info: Dict[str, Any] = Field(default_factory=dict)
|
||||
cached: bool = False
|
||||
response_time: float = 0.0
|
||||
110
backend/app/models/contract_summary.py
Normal file
110
backend/app/models/contract_summary.py
Normal file
|
|
@ -0,0 +1,110 @@
|
|||
from pydantic import BaseModel, Field
|
||||
from typing import Optional, Dict, Any
|
||||
|
||||
class ScopeOfWork(BaseModel):
|
||||
summary_tasks_deliverables: Optional[str] = None
|
||||
key_dates: Optional[str] = None
|
||||
key_kpis: Optional[str] = None
|
||||
|
||||
class TermsAndTermination(BaseModel):
|
||||
duration: Optional[str] = None
|
||||
termination_conditions: Optional[str] = None
|
||||
penalties: Optional[str] = None
|
||||
|
||||
class PaymentTerms(BaseModel):
|
||||
payment_method: Optional[str] = None
|
||||
payment_schedule: Optional[str] = None
|
||||
pricing_details: Optional[str] = None
|
||||
mark_ups: Optional[str] = None
|
||||
payment_schedules: Optional[str] = None
|
||||
late_payment_penalties: Optional[str] = None
|
||||
discounts: Optional[str] = None
|
||||
|
||||
class LiabilityIndemnification(BaseModel):
|
||||
responsibilities_liabilities: Optional[str] = None
|
||||
damages_losses: Optional[str] = None
|
||||
indemnification_clauses: Optional[str] = None
|
||||
|
||||
class Confidentiality(BaseModel):
|
||||
scope: Optional[str] = None
|
||||
duration: Optional[str] = None
|
||||
exceptions: Optional[str] = None
|
||||
disclosures_by_law: Optional[str] = None
|
||||
breach_consequences: Optional[str] = None
|
||||
|
||||
class IntellectualProperty(BaseModel):
|
||||
licensor: Optional[str] = None
|
||||
licensee: Optional[str] = None
|
||||
terms_renewal: Optional[str] = None
|
||||
pricing: Optional[str] = None
|
||||
definitions: Optional[str] = None
|
||||
scope: Optional[str] = None
|
||||
duration: Optional[str] = None
|
||||
territory: Optional[str] = None
|
||||
use_ownership_rights: Optional[str] = None
|
||||
|
||||
class DisputeResolution(BaseModel):
|
||||
methods: Optional[str] = None
|
||||
mediation_options: Optional[str] = None
|
||||
arbitration_options: Optional[str] = None
|
||||
litigation_options: Optional[str] = None
|
||||
|
||||
class WarrantiesRepresentations(BaseModel):
|
||||
service_standards: Optional[str] = None
|
||||
service_assurances: Optional[str] = None
|
||||
|
||||
class ComplianceWithLaws(BaseModel):
|
||||
relevant_laws: Optional[str] = None
|
||||
owner_obligations: Optional[str] = None
|
||||
|
||||
class AmendmentsVersions(BaseModel):
|
||||
change_management: Optional[str] = None
|
||||
written_consent: Optional[str] = None
|
||||
|
||||
class AssignmentSubcontracting(BaseModel):
|
||||
delegation_assignment: Optional[str] = None
|
||||
|
||||
class ContractSummary(BaseModel):
|
||||
"""Complete contract summary schema matching the reference implementation"""
|
||||
|
||||
# Basic contract information
|
||||
contract_type: Optional[str] = None
|
||||
overview_purpose: Optional[str] = None
|
||||
relevant_account: Optional[str] = None
|
||||
in_studio_name: Optional[str] = None
|
||||
client_sender_name: Optional[str] = None
|
||||
client_sender_address: Optional[str] = None
|
||||
agency_name: Optional[str] = None
|
||||
agency_address: Optional[str] = None
|
||||
dates_signed: Optional[str] = None
|
||||
terms: Optional[str] = None
|
||||
date_expired: Optional[str] = None
|
||||
pricing_payment_terms: Optional[str] = None
|
||||
|
||||
# Complex nested sections
|
||||
scope_of_work: Optional[ScopeOfWork] = None
|
||||
terms_and_termination: Optional[TermsAndTermination] = None
|
||||
payment_terms: Optional[PaymentTerms] = None
|
||||
liability_indemnification: Optional[LiabilityIndemnification] = None
|
||||
confidentiality: Optional[Confidentiality] = None
|
||||
intellectual_property: Optional[IntellectualProperty] = None
|
||||
dispute_resolution: Optional[DisputeResolution] = None
|
||||
warranties_representations: Optional[WarrantiesRepresentations] = None
|
||||
compliance_with_laws: Optional[ComplianceWithLaws] = None
|
||||
amendments_versions: Optional[AmendmentsVersions] = None
|
||||
assignment_subcontracting: Optional[AssignmentSubcontracting] = None
|
||||
|
||||
class Config:
|
||||
json_encoders = {
|
||||
type(None): lambda v: "N/A (Not found in Doc)"
|
||||
}
|
||||
|
||||
class ContractSummaryResponse(BaseModel):
|
||||
"""Response model for contract summary API"""
|
||||
success: bool
|
||||
summary: Optional[ContractSummary] = None
|
||||
status: str
|
||||
filename: Optional[str] = None
|
||||
created_at: Optional[str] = None
|
||||
updated_at: Optional[str] = None
|
||||
error: Optional[str] = None
|
||||
59
backend/app/models/document.py
Normal file
59
backend/app/models/document.py
Normal file
|
|
@ -0,0 +1,59 @@
|
|||
from pydantic import BaseModel, Field
|
||||
from typing import Optional, List, Dict, Any
|
||||
from datetime import datetime
|
||||
from bson import ObjectId
|
||||
from .user import PyObjectId
|
||||
|
||||
class DocumentBase(BaseModel):
|
||||
filename: str
|
||||
original_filename: str
|
||||
file_size: int
|
||||
content_type: str
|
||||
index_id: str
|
||||
uploaded_by: PyObjectId
|
||||
created_at: Optional[datetime] = None
|
||||
updated_at: Optional[datetime] = None
|
||||
|
||||
class DocumentCreate(DocumentBase):
|
||||
pass
|
||||
|
||||
class DocumentUpdate(BaseModel):
|
||||
filename: Optional[str] = None
|
||||
updated_at: Optional[datetime] = None
|
||||
|
||||
class DocumentInDB(DocumentBase):
|
||||
id: PyObjectId = Field(default_factory=PyObjectId, alias="_id")
|
||||
file_path: str
|
||||
processing_status: str = "pending" # pending, processing, completed, failed
|
||||
metadata: Dict[str, Any] = Field(default_factory=dict)
|
||||
parsed_text: Optional[str] = None
|
||||
text_chunks: Optional[List[str]] = None
|
||||
embedding_status: str = "pending" # pending, processing, completed, failed
|
||||
chunk_count: int = 0
|
||||
vector_ids: Optional[List[str]] = None
|
||||
contract_summary: Optional[Dict[str, Any]] = None
|
||||
summary_status: str = "pending" # pending, processing, completed, failed
|
||||
summary_created_at: Optional[datetime] = None
|
||||
|
||||
class Config:
|
||||
populate_by_name = True
|
||||
arbitrary_types_allowed = True
|
||||
json_encoders = {ObjectId: str}
|
||||
|
||||
class Document(DocumentBase):
|
||||
id: PyObjectId = Field(default_factory=PyObjectId, alias="_id")
|
||||
processing_status: str = "pending"
|
||||
metadata: Dict[str, Any] = Field(default_factory=dict)
|
||||
parsed_text: Optional[str] = None
|
||||
text_chunks: Optional[List[str]] = None
|
||||
embedding_status: str = "pending"
|
||||
chunk_count: int = 0
|
||||
vector_ids: Optional[List[str]] = None
|
||||
contract_summary: Optional[Dict[str, Any]] = None
|
||||
summary_status: str = "pending"
|
||||
summary_created_at: Optional[datetime] = None
|
||||
|
||||
class Config:
|
||||
populate_by_name = True
|
||||
arbitrary_types_allowed = True
|
||||
json_encoders = {ObjectId: str}
|
||||
52
backend/app/models/index.py
Normal file
52
backend/app/models/index.py
Normal file
|
|
@ -0,0 +1,52 @@
|
|||
from pydantic import BaseModel, Field
|
||||
from typing import Optional, List, Dict, Any
|
||||
from datetime import datetime
|
||||
from bson import ObjectId
|
||||
from .user import PyObjectId
|
||||
|
||||
class IndexBase(BaseModel):
|
||||
name: str
|
||||
description: Optional[str] = None
|
||||
created_by: PyObjectId
|
||||
created_at: Optional[datetime] = None
|
||||
updated_at: Optional[datetime] = None
|
||||
|
||||
class IndexCreate(IndexBase):
|
||||
pass
|
||||
|
||||
class IndexUpdate(BaseModel):
|
||||
name: Optional[str] = None
|
||||
description: Optional[str] = None
|
||||
updated_at: Optional[datetime] = None
|
||||
|
||||
class IndexInDB(IndexBase):
|
||||
id: PyObjectId = Field(default_factory=PyObjectId, alias="_id")
|
||||
index_id: str # Unique string identifier for the index
|
||||
status: str = "active" # active, inactive, deleted
|
||||
document_count: int = 0
|
||||
settings: Dict[str, Any] = Field(default_factory=dict)
|
||||
vector_store_path: Optional[str] = None
|
||||
embedding_model: str = "text-embedding-ada-002"
|
||||
chunk_size: int = 1000
|
||||
chunk_overlap: int = 200
|
||||
|
||||
class Config:
|
||||
populate_by_name = True
|
||||
arbitrary_types_allowed = True
|
||||
json_encoders = {ObjectId: str}
|
||||
|
||||
class Index(IndexBase):
|
||||
id: PyObjectId = Field(default_factory=PyObjectId, alias="_id")
|
||||
index_id: str
|
||||
status: str = "active"
|
||||
document_count: int = 0
|
||||
settings: Dict[str, Any] = Field(default_factory=dict)
|
||||
vector_store_path: Optional[str] = None
|
||||
embedding_model: str = "text-embedding-ada-002"
|
||||
chunk_size: int = 1000
|
||||
chunk_overlap: int = 200
|
||||
|
||||
class Config:
|
||||
populate_by_name = True
|
||||
arbitrary_types_allowed = True
|
||||
json_encoders = {ObjectId: str}
|
||||
105
backend/app/models/user.py
Normal file
105
backend/app/models/user.py
Normal file
|
|
@ -0,0 +1,105 @@
|
|||
from pydantic import BaseModel, Field, EmailStr
|
||||
from typing import Optional, List, Dict
|
||||
from datetime import datetime
|
||||
from bson import ObjectId
|
||||
from enum import Enum
|
||||
|
||||
class UserRole(str, Enum):
|
||||
ADMIN = "admin"
|
||||
USER = "user"
|
||||
|
||||
class AuthMethod(str, Enum):
|
||||
LOCAL = "local"
|
||||
SSO = "sso"
|
||||
|
||||
class PyObjectId(ObjectId):
|
||||
@classmethod
|
||||
def __get_pydantic_core_schema__(cls, _source_type, _handler):
|
||||
from pydantic_core import core_schema
|
||||
|
||||
def validate_from_str(input_value: str) -> ObjectId:
|
||||
return ObjectId(input_value)
|
||||
|
||||
def validate_from_objectid(input_value: ObjectId) -> ObjectId:
|
||||
return input_value
|
||||
|
||||
return core_schema.union_schema([
|
||||
core_schema.is_instance_schema(ObjectId),
|
||||
core_schema.no_info_plain_validator_function(
|
||||
validate_from_str,
|
||||
serialization=core_schema.to_string_ser_schema(),
|
||||
),
|
||||
])
|
||||
|
||||
@classmethod
|
||||
def __get_pydantic_json_schema__(cls, core_schema, handler):
|
||||
json_schema = handler(core_schema)
|
||||
json_schema.update(type="string")
|
||||
return json_schema
|
||||
|
||||
class UserBase(BaseModel):
|
||||
email: EmailStr
|
||||
role: UserRole = UserRole.USER
|
||||
is_active: bool = True
|
||||
auth_method: AuthMethod = AuthMethod.LOCAL
|
||||
sso_provider: Optional[str] = None
|
||||
sso_user_id: Optional[str] = None
|
||||
sso_email: Optional[str] = None
|
||||
sso_name: Optional[str] = None
|
||||
sso_attributes: Optional[Dict] = None
|
||||
last_sso_login: Optional[datetime] = None
|
||||
created_at: Optional[datetime] = None
|
||||
updated_at: Optional[datetime] = None
|
||||
|
||||
class UserCreate(UserBase):
|
||||
password: str
|
||||
|
||||
class UserUpdate(BaseModel):
|
||||
email: Optional[EmailStr] = None
|
||||
role: Optional[UserRole] = None
|
||||
is_active: Optional[bool] = None
|
||||
password: Optional[str] = None
|
||||
auth_method: Optional[AuthMethod] = None
|
||||
sso_provider: Optional[str] = None
|
||||
sso_user_id: Optional[str] = None
|
||||
sso_email: Optional[str] = None
|
||||
sso_name: Optional[str] = None
|
||||
sso_attributes: Optional[Dict] = None
|
||||
last_sso_login: Optional[datetime] = None
|
||||
|
||||
class UserInDB(UserBase):
|
||||
id: PyObjectId = Field(default_factory=PyObjectId, alias="_id")
|
||||
hashed_password: Optional[str] = None # Optional for SSO users
|
||||
index_access: List[str] = Field(default_factory=list) # List of index IDs user has access to
|
||||
|
||||
class Config:
|
||||
populate_by_name = True
|
||||
arbitrary_types_allowed = True
|
||||
json_encoders = {ObjectId: str}
|
||||
|
||||
class User(UserBase):
|
||||
id: PyObjectId = Field(default_factory=PyObjectId, alias="_id")
|
||||
index_access: List[str] = Field(default_factory=list)
|
||||
|
||||
class Config:
|
||||
populate_by_name = True
|
||||
arbitrary_types_allowed = True
|
||||
json_encoders = {ObjectId: str}
|
||||
|
||||
class UserResponse(BaseModel):
|
||||
id: PyObjectId = Field(alias="_id")
|
||||
email: EmailStr
|
||||
role: UserRole
|
||||
is_active: bool
|
||||
auth_method: AuthMethod
|
||||
sso_provider: Optional[str] = None
|
||||
sso_name: Optional[str] = None
|
||||
last_sso_login: Optional[datetime] = None
|
||||
index_access: List[str]
|
||||
created_at: Optional[datetime] = None
|
||||
updated_at: Optional[datetime] = None
|
||||
|
||||
class Config:
|
||||
populate_by_name = True
|
||||
arbitrary_types_allowed = True
|
||||
json_encoders = {ObjectId: str}
|
||||
7
backend/app/services/__init__.py
Normal file
7
backend/app/services/__init__.py
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
from .document_processor import document_processor
|
||||
from .rag_service import rag_service
|
||||
|
||||
__all__ = [
|
||||
"document_processor",
|
||||
"rag_service"
|
||||
]
|
||||
192
backend/app/services/chat_context_service.py
Normal file
192
backend/app/services/chat_context_service.py
Normal file
|
|
@ -0,0 +1,192 @@
|
|||
import asyncio
|
||||
from typing import List, Dict, Any, Optional
|
||||
from datetime import datetime, timedelta
|
||||
from motor.motor_asyncio import AsyncIOMotorDatabase
|
||||
from bson import ObjectId
|
||||
|
||||
# Cache import removed - caching disabled for data freshness
|
||||
from ..config.settings import settings
|
||||
from .llama_processor import llama_processor
|
||||
|
||||
|
||||
class ChatContextService:
|
||||
def __init__(self):
|
||||
self.max_context_messages = 10 # Maximum number of previous messages to include
|
||||
self.context_window_hours = 24 # Context window in hours
|
||||
|
||||
async def get_conversation_context(
|
||||
self,
|
||||
user_id: str,
|
||||
index_id: str,
|
||||
db: AsyncIOMotorDatabase,
|
||||
limit: int = None
|
||||
) -> List[Dict[str, Any]]:
|
||||
"""Get recent conversation context for the user and index"""
|
||||
try:
|
||||
# Use provided limit or default
|
||||
message_limit = limit or self.max_context_messages
|
||||
|
||||
# Get recent messages within the context window
|
||||
cutoff_time = datetime.utcnow() - timedelta(hours=self.context_window_hours)
|
||||
|
||||
cursor = db.chat_messages.find({
|
||||
"user_id": ObjectId(user_id),
|
||||
"index_id": index_id,
|
||||
"created_at": {"$gte": cutoff_time},
|
||||
"deleted_by_user": {"$ne": True}
|
||||
}).sort("created_at", -1).limit(message_limit)
|
||||
|
||||
messages = []
|
||||
async for msg in cursor:
|
||||
messages.append({
|
||||
"query": msg["query"],
|
||||
"response": msg["response"],
|
||||
"created_at": msg["created_at"]
|
||||
})
|
||||
|
||||
# Return in chronological order (oldest first)
|
||||
return list(reversed(messages))
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error getting conversation context: {e}")
|
||||
return []
|
||||
|
||||
def format_context_for_ai(self, context_messages: List[Dict[str, Any]]) -> str:
|
||||
"""Format conversation context for AI prompt"""
|
||||
if not context_messages:
|
||||
return ""
|
||||
|
||||
formatted_context = []
|
||||
for msg in context_messages:
|
||||
formatted_context.append(f"User: {msg['query']}")
|
||||
formatted_context.append(f"Assistant: {msg['response']}")
|
||||
|
||||
return "\n".join(formatted_context)
|
||||
|
||||
async def generate_contextual_response(
|
||||
self,
|
||||
query: str,
|
||||
index_id: str,
|
||||
user_id: str,
|
||||
db: AsyncIOMotorDatabase,
|
||||
context_chunks: List[str]
|
||||
) -> Dict[str, Any]:
|
||||
"""Generate response with conversation context"""
|
||||
try:
|
||||
# Get conversation context
|
||||
context_messages = await self.get_conversation_context(
|
||||
user_id, index_id, db
|
||||
)
|
||||
|
||||
# Format conversation context
|
||||
conversation_context = self.format_context_for_ai(context_messages)
|
||||
|
||||
# Prepare document context
|
||||
document_context = "\n\n".join(context_chunks)
|
||||
|
||||
# Create enhanced prompt with conversation context
|
||||
prompt = self._create_contextual_prompt(
|
||||
query, document_context, conversation_context
|
||||
)
|
||||
|
||||
# Generate response using OpenAI
|
||||
from llama_index.llms.openai import OpenAI
|
||||
llm = OpenAI(
|
||||
model="gpt-4o",
|
||||
api_key=settings.openai_api_key,
|
||||
temperature=0.1
|
||||
)
|
||||
|
||||
# Use sync completion for now as acomplete has issues
|
||||
response = llm.complete(prompt)
|
||||
|
||||
return {
|
||||
"response": response.text,
|
||||
"context_used": conversation_context,
|
||||
"context_messages_count": len(context_messages)
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error generating contextual response: {e}")
|
||||
# Fallback to basic response without context
|
||||
try:
|
||||
from llama_index.llms.openai import OpenAI
|
||||
llm = OpenAI(
|
||||
model="gpt-4o",
|
||||
api_key=settings.openai_api_key,
|
||||
temperature=0.1
|
||||
)
|
||||
|
||||
# Simple prompt without context
|
||||
context_text = "\n\n".join(context_chunks)
|
||||
simple_prompt = f"""Based on the following context, answer the user's question. If the answer is not in the context, say "I don't have enough information to answer that question."
|
||||
|
||||
Return results as pure markdown - no code block.
|
||||
|
||||
Context:
|
||||
{context_text}
|
||||
|
||||
Question: {query}
|
||||
|
||||
Answer:"""
|
||||
|
||||
response = llm.complete(simple_prompt)
|
||||
|
||||
return {
|
||||
"response": response.text,
|
||||
"context_used": None,
|
||||
"context_messages_count": 0
|
||||
}
|
||||
except Exception as fallback_error:
|
||||
print(f"Fallback response generation failed: {fallback_error}")
|
||||
return {
|
||||
"response": "I'm sorry, I encountered an error while processing your question. Please try again.",
|
||||
"context_used": None,
|
||||
"context_messages_count": 0
|
||||
}
|
||||
|
||||
def _create_contextual_prompt(
|
||||
self,
|
||||
query: str,
|
||||
document_context: str,
|
||||
conversation_context: str
|
||||
) -> str:
|
||||
"""Create a contextual prompt for the AI"""
|
||||
prompt_parts = []
|
||||
|
||||
prompt_parts.append(
|
||||
"You are an AI assistant helping users understand their documents. "
|
||||
"Answer questions based on the provided document context and consider "
|
||||
"the conversation history for continuity."
|
||||
)
|
||||
|
||||
if conversation_context:
|
||||
prompt_parts.append(f"""
|
||||
Previous conversation:
|
||||
{conversation_context}
|
||||
""")
|
||||
|
||||
prompt_parts.append(f"""
|
||||
Document context:
|
||||
{document_context}
|
||||
|
||||
Current question: {query}
|
||||
|
||||
Instructions:
|
||||
1. Answer based primarily on the document context provided
|
||||
2. Consider the conversation history for continuity and context
|
||||
3. If the answer is not in the documents, clearly state this
|
||||
4. Be concise but comprehensive
|
||||
5. Reference specific information from the documents when possible
|
||||
6. If referring to previous parts of the conversation, be explicit about it
|
||||
7. Return results as pure markdown - no code block
|
||||
|
||||
Answer:""")
|
||||
|
||||
return "\n".join(prompt_parts)
|
||||
|
||||
# cache_context_key method removed - caching disabled for data freshness
|
||||
|
||||
|
||||
# Global chat context service instance
|
||||
chat_context_service = ChatContextService()
|
||||
238
backend/app/services/contract_summary_service.py
Normal file
238
backend/app/services/contract_summary_service.py
Normal file
|
|
@ -0,0 +1,238 @@
|
|||
import json
|
||||
import asyncio
|
||||
from typing import Dict, Any, Optional
|
||||
from datetime import datetime
|
||||
from openai import OpenAI
|
||||
from ..config.settings import settings
|
||||
from ..models.contract_summary import ContractSummary
|
||||
|
||||
class ContractSummaryService:
|
||||
"""Service for extracting structured contract summaries using OpenAI GPT-4"""
|
||||
|
||||
def __init__(self):
|
||||
self.client = OpenAI(api_key=settings.openai_api_key)
|
||||
self.max_chars = settings.max_document_chars
|
||||
|
||||
async def extract_contract_summary(self, document_text: str, filename: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Extract structured contract summary using OpenAI GPT-4
|
||||
|
||||
Args:
|
||||
document_text (str): Full text of the document
|
||||
filename (str): Name of the file being processed
|
||||
|
||||
Returns:
|
||||
dict: Extraction result with success flag and summary data
|
||||
"""
|
||||
try:
|
||||
print(f"Extracting contract summary from: {filename}")
|
||||
|
||||
# Check document length and raise error if too long
|
||||
if len(document_text) > self.max_chars:
|
||||
error_msg = f"Document too large: {len(document_text)} characters exceeds maximum of {self.max_chars} characters"
|
||||
print(f"Error: {error_msg}")
|
||||
return {
|
||||
"success": False,
|
||||
"error": error_msg,
|
||||
"filename": filename
|
||||
}
|
||||
|
||||
# Get the contract schema prompt
|
||||
contract_schema = self._get_contract_schema()
|
||||
|
||||
# Create the prompt
|
||||
prompt = f"""
|
||||
Document filename: {filename}
|
||||
|
||||
Document content:
|
||||
{document_text}
|
||||
|
||||
{contract_schema}
|
||||
"""
|
||||
|
||||
# Call OpenAI API
|
||||
response = await asyncio.to_thread(
|
||||
self.client.chat.completions.create,
|
||||
model="gpt-4o",
|
||||
messages=[
|
||||
{
|
||||
"role": "system",
|
||||
"content": "You are a contract analysis expert. Extract contract information accurately and return only valid JSON."
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": prompt
|
||||
}
|
||||
],
|
||||
max_tokens=4000,
|
||||
temperature=0.1
|
||||
)
|
||||
|
||||
# Extract the response
|
||||
content = response.choices[0].message.content.strip()
|
||||
|
||||
# Try to parse the JSON
|
||||
try:
|
||||
summary_json = json.loads(content)
|
||||
print(f"Successfully extracted summary for {filename}")
|
||||
return {
|
||||
"success": True,
|
||||
"summary": summary_json,
|
||||
"filename": filename
|
||||
}
|
||||
except json.JSONDecodeError as e:
|
||||
print(f"JSON parsing error: {e}")
|
||||
print(f"Raw response length: {len(content)} characters")
|
||||
|
||||
# Try to extract JSON from the response if it's wrapped in text
|
||||
import re
|
||||
json_match = re.search(r'\{.*\}', content, re.DOTALL)
|
||||
if json_match:
|
||||
try:
|
||||
summary_json = json.loads(json_match.group())
|
||||
print(f"Successfully extracted JSON from wrapped response for {filename}")
|
||||
return {
|
||||
"success": True,
|
||||
"summary": summary_json,
|
||||
"filename": filename
|
||||
}
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
return {
|
||||
"success": False,
|
||||
"error": f"Failed to parse JSON response: {e}. Response length: {len(content)} characters",
|
||||
"filename": filename
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error calling OpenAI API: {e}")
|
||||
return {
|
||||
"success": False,
|
||||
"error": f"OpenAI API error: {str(e)}",
|
||||
"filename": filename
|
||||
}
|
||||
|
||||
def validate_contract_summary(self, summary_json: Dict[str, Any]) -> ContractSummary:
|
||||
"""
|
||||
Validate and convert raw JSON to structured ContractSummary model
|
||||
|
||||
Args:
|
||||
summary_json (dict): Raw JSON from OpenAI
|
||||
|
||||
Returns:
|
||||
ContractSummary: Validated summary object
|
||||
"""
|
||||
try:
|
||||
# Convert any None values to "N/A (Not found in Doc)" for consistency
|
||||
def convert_none_values(obj):
|
||||
if isinstance(obj, dict):
|
||||
return {k: convert_none_values(v) for k, v in obj.items()}
|
||||
elif obj is None or obj == "":
|
||||
return "N/A (Not found in Doc)"
|
||||
return obj
|
||||
|
||||
cleaned_summary = convert_none_values(summary_json)
|
||||
|
||||
# Validate using Pydantic model
|
||||
contract_summary = ContractSummary(**cleaned_summary)
|
||||
return contract_summary
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error validating contract summary: {e}")
|
||||
# Return empty summary with error indication
|
||||
return ContractSummary(
|
||||
contract_type="Error in processing",
|
||||
overview_purpose=f"Error validating summary: {str(e)}"
|
||||
)
|
||||
|
||||
def _get_contract_schema(self) -> str:
|
||||
"""Get the contract analysis schema prompt"""
|
||||
return """
|
||||
Please extract the following information from this contract document and return it in JSON format.
|
||||
If any information is not found in the document, use "N/A (Not found in Doc)" as the value.
|
||||
|
||||
Required fields:
|
||||
{
|
||||
"contract_type": "Type of contract (MSA, SOW, Supplier Contract, Vendor Contract, Licensing Agreement, NDA, etc.)",
|
||||
"overview_purpose": "Brief overview and purpose of the contract",
|
||||
"relevant_account": "Client account name or relevant account",
|
||||
"in_studio_name": "In-Studio Name (e.g., The Mix)",
|
||||
"client_sender_name": "Client/Sender Name",
|
||||
"client_sender_address": "Client/Sender Address",
|
||||
"agency_name": "Agency Name (OLM, IIG, AYS, BTG, etc.)",
|
||||
"agency_address": "Agency Address",
|
||||
"dates_signed": "Date(s) when contract was signed",
|
||||
"terms": "Contract terms/duration",
|
||||
"date_expired": "Contract expiration date",
|
||||
"pricing_payment_terms": "Pricing and payment terms overview",
|
||||
"scope_of_work": {
|
||||
"summary_tasks_deliverables": "Summary of tasks and deliverables",
|
||||
"key_dates": "Key dates and milestones",
|
||||
"key_kpis": "Key KPIs or performance indicators"
|
||||
},
|
||||
"terms_and_termination": {
|
||||
"duration": "Look for contract duration, term length, effective period, validity period, or how long this agreement remains in force. Search for phrases like 'term of', 'duration', 'effective for', 'valid until', 'expires on', or specific time periods",
|
||||
"termination_conditions": "Find termination clauses, conditions under which either party can end the agreement, notice periods required for termination, breach conditions, or circumstances that allow contract cancellation. Look for sections titled 'Termination', 'End of Agreement', or phrases like 'may be terminated', 'notice of termination'",
|
||||
"penalties": "Search for financial penalties, liquidated damages, fees, or costs associated with early termination, breach of contract, or cancellation. Look for monetary amounts, penalty clauses, or consequences of termination"
|
||||
},
|
||||
"payment_terms": {
|
||||
"payment_method": "Search for how payments are processed - check for bank transfer, wire transfer, check, ACH, credit card, electronic payment, or specific payment platforms. Look for banking details, payment processing instructions, or remittance information",
|
||||
"payment_schedule": "Find when payments are due - look for payment frequency (monthly, quarterly, annually), due dates, billing cycles, invoice terms, or payment timing. Search for phrases like 'payable within', 'due on', 'payment schedule', or specific dates",
|
||||
"pricing_details": "Look for detailed pricing structure including rates, fees, hourly rates, project costs, retainer amounts, or cost breakdowns. Search for currency amounts, pricing tables, rate cards, or cost schedules",
|
||||
"mark_ups": "Find any markup percentages, additional fees, surcharges, or percentage-based charges applied to costs. Look for percentage symbols, markup clauses, or additional charges",
|
||||
"payment_schedules": "Look for detailed payment timing including milestone payments, installment schedules, advance payments, or progress-based payment structures. Search for payment phases or staged payment plans",
|
||||
"late_payment_penalties": "Search for late payment fees, interest charges, penalty rates, or consequences of delayed payment. Look for percentage rates for late fees, daily charges, or penalty clauses",
|
||||
"discounts": "Find any available discounts, early payment incentives, volume discounts, or reduced rates for specific conditions. Look for percentage discounts or preferential pricing terms"
|
||||
},
|
||||
"liability_indemnification": {
|
||||
"responsibilities_liabilities": "Search for sections defining each party's responsibilities, obligations, duties, or liabilities. Look for phrases like 'responsible for', 'liable for', 'obligations include', 'duties of', or specific responsibility assignments",
|
||||
"damages_losses": "Find who bears responsibility for damages, losses, claims, or financial harm. Look for liability caps, exclusions, limitations of liability, or damage allocation clauses. Search for monetary limits or damage responsibility",
|
||||
"indemnification_clauses": "Look for indemnification provisions, hold harmless clauses, or protection from third-party claims. Search for phrases like 'indemnify', 'hold harmless', 'defend against', or protection from lawsuits and claims"
|
||||
},
|
||||
"confidentiality": {
|
||||
"scope": "Look for what information is considered confidential - proprietary data, trade secrets, business information, client data, technical information, or specific categories of protected information. Search for definitions of confidential information",
|
||||
"duration": "Find how long confidentiality obligations last - look for time periods, survival clauses, or duration of non-disclosure obligations. Search for phrases like 'perpetual', 'for X years', 'survives termination', or confidentiality periods",
|
||||
"exceptions": "Search for exceptions to confidentiality - publicly available information, independently developed information, or legally required disclosures. Look for carve-outs or situations where confidentiality doesn't apply",
|
||||
"disclosures_by_law": "Find circumstances where confidential information may be disclosed due to legal requirements, court orders, regulatory demands, or government requests. Look for legal disclosure provisions",
|
||||
"breach_consequences": "Search for penalties, damages, or consequences for violating confidentiality obligations. Look for monetary damages, injunctive relief, or specific penalties for breach of non-disclosure"
|
||||
},
|
||||
"intellectual_property": {
|
||||
"licensor": "Find who is granting intellectual property rights - the party providing licenses, copyrights, trademarks, or other IP rights. Look for the entity or person licensing their intellectual property",
|
||||
"licensee": "Identify who is receiving intellectual property rights - the party getting permission to use copyrights, trademarks, patents, or other IP. Look for the recipient of IP licensing",
|
||||
"terms_renewal": "Search for intellectual property renewal terms, license extension conditions, or how IP rights can be renewed or continued. Look for renewal clauses, automatic extensions, or renewal procedures",
|
||||
"pricing": "Find costs associated with intellectual property use - licensing fees, royalties, IP-related payments, or costs for using copyrighted or trademarked materials. Look for IP pricing structures",
|
||||
"definitions": "Look for definitions of intellectual property terms, what constitutes IP in this agreement, or specific IP categories covered. Search for IP definitions and scope of protected materials",
|
||||
"scope": "Find what intellectual property rights are included - copyrights, trademarks, patents, trade secrets, proprietary information, or specific IP assets covered by the agreement",
|
||||
"duration": "Search for how long intellectual property rights last - license duration, IP protection periods, or time limits on IP usage. Look for IP term lengths or expiration dates",
|
||||
"territory": "Find geographical limitations on IP rights - specific countries, regions, or territories where IP rights apply. Look for geographic restrictions or worldwide rights",
|
||||
"use_ownership_rights": "Search for permitted uses of intellectual property, ownership transfers, usage restrictions, or what can be done with the licensed IP. Look for usage rights and ownership provisions"
|
||||
},
|
||||
"dispute_resolution": {
|
||||
"methods": "Search for how disputes will be resolved - negotiation, mediation, arbitration, litigation, or alternative dispute resolution methods. Look for dispute resolution procedures or escalation processes",
|
||||
"mediation_options": "Find if mediation is required or available for resolving disputes - look for mediation clauses, mediator selection, or mediation procedures. Search for mediation requirements or options",
|
||||
"arbitration_options": "Look for arbitration clauses, arbitration requirements, arbitrator selection procedures, or binding arbitration provisions. Search for arbitration rules or arbitration organization references",
|
||||
"litigation_options": "Find court jurisdiction, governing law, or litigation procedures if disputes go to court. Look for jurisdiction clauses, court selection, or legal venue specifications"
|
||||
},
|
||||
"warranties_representations": {
|
||||
"service_standards": "Look for quality standards, performance expectations, service level agreements, or specific standards that services must meet. Search for performance metrics, quality requirements, or service benchmarks",
|
||||
"service_assurances": "Find warranties, guarantees, representations, or assurances about service quality, performance, or outcomes. Look for warranty clauses, service guarantees, or performance assurances"
|
||||
},
|
||||
"compliance_with_laws": {
|
||||
"relevant_laws": "Search for specific laws, regulations, statutes, or legal requirements that parties must comply with. Look for regulatory compliance, legal standards, or specific legislation mentioned in the contract",
|
||||
"owner_obligations": "Find legal obligations, compliance responsibilities, or regulatory duties that each party must fulfill. Look for compliance requirements, legal duties, or regulatory obligations"
|
||||
},
|
||||
"amendments_versions": {
|
||||
"change_management": "Look for how contract changes are managed - amendment procedures, modification processes, or change control mechanisms. Search for how the contract can be updated, modified, or amended",
|
||||
"written_consent": "Find requirements for written consent, signatures, or formal approval needed for contract changes. Look for amendment approval processes or consent requirements for modifications"
|
||||
},
|
||||
"assignment_subcontracting": {
|
||||
"delegation_assignment": "Search for rules about assigning contract rights, delegating obligations, or transferring responsibilities to third parties. Look for assignment clauses, subcontracting permissions, or restrictions on transferring contract duties. Find phrases like 'may not assign', 'assignment requires consent', or subcontracting limitations"
|
||||
}
|
||||
}
|
||||
|
||||
IMPORTANT: Return ONLY valid JSON. Do not include any explanatory text before or after the JSON.
|
||||
"""
|
||||
|
||||
# Global service instance
|
||||
contract_summary_service = ContractSummaryService()
|
||||
209
backend/app/services/document_processor.py
Normal file
209
backend/app/services/document_processor.py
Normal file
|
|
@ -0,0 +1,209 @@
|
|||
import os
|
||||
import uuid
|
||||
import shutil
|
||||
from typing import List, Dict, Any, Optional
|
||||
from pathlib import Path
|
||||
import aiofiles
|
||||
from fastapi import UploadFile, HTTPException
|
||||
from motor.motor_asyncio import AsyncIOMotorDatabase
|
||||
|
||||
from ..config.settings import settings
|
||||
from ..models.document import DocumentCreate, DocumentInDB
|
||||
from ..models.user import UserInDB
|
||||
from ..utils.file_utils import validate_file, get_file_info
|
||||
|
||||
class DocumentProcessor:
|
||||
def __init__(self):
|
||||
self.upload_dir = Path(settings.upload_dir)
|
||||
self.allowed_extensions = {
|
||||
'.pdf', '.docx', '.doc', '.txt', '.csv', '.json', '.html', '.md', '.rtf'
|
||||
}
|
||||
self.max_file_size = 50 * 1024 * 1024 # 50MB
|
||||
|
||||
async def process_upload(
|
||||
self,
|
||||
file: UploadFile,
|
||||
index_id: str,
|
||||
user: UserInDB,
|
||||
db: AsyncIOMotorDatabase
|
||||
) -> DocumentInDB:
|
||||
"""Process uploaded file and save to database"""
|
||||
# Validate file
|
||||
await self._validate_file(file)
|
||||
|
||||
# Generate unique filename
|
||||
file_extension = Path(file.filename).suffix.lower()
|
||||
unique_filename = f"{uuid.uuid4()}{file_extension}"
|
||||
|
||||
# Create index-specific upload directory
|
||||
index_upload_dir = self.upload_dir / index_id
|
||||
index_upload_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Save file
|
||||
file_path = index_upload_dir / unique_filename
|
||||
await self._save_file(file, file_path)
|
||||
|
||||
# Create document record
|
||||
document_data = DocumentCreate(
|
||||
filename=unique_filename,
|
||||
original_filename=file.filename,
|
||||
file_size=file.size,
|
||||
content_type=file.content_type,
|
||||
index_id=index_id,
|
||||
uploaded_by=user.id
|
||||
)
|
||||
|
||||
document_dict = document_data.dict()
|
||||
document_dict["file_path"] = str(file_path)
|
||||
document_dict["processing_status"] = "pending"
|
||||
document_dict["embedding_status"] = "pending"
|
||||
document_dict["metadata"] = {}
|
||||
document_dict["chunk_count"] = 0
|
||||
|
||||
# Save to database
|
||||
result = await db.documents.insert_one(document_dict)
|
||||
document_dict["_id"] = result.inserted_id
|
||||
|
||||
return DocumentInDB(**document_dict)
|
||||
|
||||
async def _validate_file(self, file: UploadFile):
|
||||
"""Validate uploaded file"""
|
||||
if not file.filename:
|
||||
raise HTTPException(status_code=400, detail="No file provided")
|
||||
|
||||
# Check file extension
|
||||
file_extension = Path(file.filename).suffix.lower()
|
||||
if file_extension not in self.allowed_extensions:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=f"File type {file_extension} not supported. Allowed types: {', '.join(self.allowed_extensions)}"
|
||||
)
|
||||
|
||||
# Check file size
|
||||
if file.size > self.max_file_size:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=f"File too large. Maximum size: {self.max_file_size / (1024*1024):.1f}MB"
|
||||
)
|
||||
|
||||
async def _save_file(self, file: UploadFile, file_path: Path):
|
||||
"""Save uploaded file to disk"""
|
||||
try:
|
||||
async with aiofiles.open(file_path, 'wb') as f:
|
||||
content = await file.read()
|
||||
await f.write(content)
|
||||
except Exception as e:
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail=f"Error saving file: {str(e)}"
|
||||
)
|
||||
|
||||
async def delete_document(
|
||||
self,
|
||||
document_id: str,
|
||||
db: AsyncIOMotorDatabase
|
||||
) -> bool:
|
||||
"""Delete document and associated file with complete cleanup"""
|
||||
from bson import ObjectId
|
||||
|
||||
# Get document
|
||||
document = await db.documents.find_one({"_id": ObjectId(document_id)})
|
||||
if not document:
|
||||
return False
|
||||
|
||||
index_id = document["index_id"]
|
||||
|
||||
# Delete embeddings from vector store
|
||||
try:
|
||||
from .llama_processor import llama_processor
|
||||
await llama_processor.delete_document_embeddings(document_id, index_id)
|
||||
except Exception as e:
|
||||
print(f"Error deleting embeddings for document {document_id}: {e}")
|
||||
|
||||
# Delete file
|
||||
file_path = Path(document["file_path"])
|
||||
if file_path.exists():
|
||||
try:
|
||||
file_path.unlink()
|
||||
except Exception as e:
|
||||
print(f"Error deleting file {file_path}: {e}")
|
||||
|
||||
# Note: Cache clearing removed - caching is disabled for data freshness
|
||||
|
||||
# Delete document record
|
||||
result = await db.documents.delete_one({"_id": ObjectId(document_id)})
|
||||
return result.deleted_count > 0
|
||||
|
||||
async def get_documents_by_index(
|
||||
self,
|
||||
index_id: str,
|
||||
db: AsyncIOMotorDatabase
|
||||
) -> List[DocumentInDB]:
|
||||
"""Get all documents for an index"""
|
||||
documents = []
|
||||
cursor = db.documents.find({"index_id": index_id})
|
||||
|
||||
async for doc in cursor:
|
||||
documents.append(DocumentInDB(**doc))
|
||||
|
||||
return documents
|
||||
|
||||
async def update_processing_status(
|
||||
self,
|
||||
document_id: str,
|
||||
status: str,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
db: AsyncIOMotorDatabase = None
|
||||
):
|
||||
"""Update document processing status"""
|
||||
from bson import ObjectId
|
||||
from datetime import datetime
|
||||
|
||||
update_data = {
|
||||
"processing_status": status,
|
||||
"updated_at": datetime.utcnow()
|
||||
}
|
||||
|
||||
if metadata:
|
||||
update_data["metadata"] = metadata
|
||||
# Store parsed text if available
|
||||
if "parsed_text" in metadata:
|
||||
update_data["parsed_text"] = metadata["parsed_text"]
|
||||
if "chunk_count" in metadata:
|
||||
update_data["chunk_count"] = metadata["chunk_count"]
|
||||
|
||||
await db.documents.update_one(
|
||||
{"_id": ObjectId(document_id)},
|
||||
{"$set": update_data}
|
||||
)
|
||||
|
||||
async def update_embedding_status(
|
||||
self,
|
||||
document_id: str,
|
||||
status: str,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
db: AsyncIOMotorDatabase = None
|
||||
):
|
||||
"""Update document embedding status"""
|
||||
from bson import ObjectId
|
||||
from datetime import datetime
|
||||
|
||||
update_data = {
|
||||
"embedding_status": status,
|
||||
"updated_at": datetime.utcnow()
|
||||
}
|
||||
|
||||
if metadata:
|
||||
# Store vector information if available
|
||||
if "vector_ids" in metadata:
|
||||
update_data["vector_ids"] = metadata["vector_ids"]
|
||||
if "chunk_count" in metadata:
|
||||
update_data["chunk_count"] = metadata["chunk_count"]
|
||||
|
||||
await db.documents.update_one(
|
||||
{"_id": ObjectId(document_id)},
|
||||
{"$set": update_data}
|
||||
)
|
||||
|
||||
# Global processor instance
|
||||
document_processor = DocumentProcessor()
|
||||
881
backend/app/services/llama_processor.py
Normal file
881
backend/app/services/llama_processor.py
Normal file
|
|
@ -0,0 +1,881 @@
|
|||
import os
|
||||
import uuid
|
||||
import asyncio
|
||||
from typing import List, Dict, Any, Optional, Union
|
||||
from pathlib import Path
|
||||
import aiofiles
|
||||
from fastapi import UploadFile, HTTPException
|
||||
from motor.motor_asyncio import AsyncIOMotorDatabase
|
||||
from bson import ObjectId
|
||||
from datetime import datetime
|
||||
|
||||
from llama_index.core import (
|
||||
VectorStoreIndex,
|
||||
StorageContext,
|
||||
Settings,
|
||||
Document as LlamaDocument
|
||||
)
|
||||
from llama_index.core.node_parser import SemanticSplitterNodeParser
|
||||
from llama_index.core.embeddings import BaseEmbedding
|
||||
from llama_index.embeddings.openai import OpenAIEmbedding
|
||||
from llama_index.llms.openai import OpenAI
|
||||
from llama_index.vector_stores.chroma import ChromaVectorStore
|
||||
from llama_cloud_services import LlamaParse
|
||||
import chromadb
|
||||
from chromadb.config import Settings as ChromaSettings
|
||||
from chromadb.utils import embedding_functions
|
||||
|
||||
from ..config.settings import settings
|
||||
from ..models.document import DocumentInDB, DocumentCreate
|
||||
from ..core.chroma_client import chroma_singleton
|
||||
from ..models.index import IndexInDB
|
||||
from ..models.user import UserInDB
|
||||
from ..utils.file_utils import validate_file, get_file_info
|
||||
from ..services.contract_summary_service import contract_summary_service
|
||||
|
||||
|
||||
class LlamaProcessor:
|
||||
def __init__(self):
|
||||
self.upload_dir = Path(settings.upload_dir)
|
||||
self.indices_dir = Path(settings.indices_dir)
|
||||
self.allowed_extensions = {
|
||||
'.pdf', '.docx', '.doc', '.txt', '.csv', '.json', '.html', '.md', '.rtf'
|
||||
}
|
||||
self.max_file_size = 50 * 1024 * 1024 # 50MB
|
||||
|
||||
# Initialize LlamaIndex components
|
||||
self._setup_llama_index()
|
||||
|
||||
# ChromaDB client managed by singleton
|
||||
|
||||
def get_chroma_client(self):
|
||||
"""Get or create ChromaDB client using shared singleton"""
|
||||
chroma_db_path = str(self.indices_dir / "chroma_db")
|
||||
return chroma_singleton.get_client(chroma_db_path)
|
||||
|
||||
def _setup_llama_index(self):
|
||||
"""Setup LlamaIndex components"""
|
||||
# Configure OpenAI
|
||||
Settings.llm = OpenAI(
|
||||
model="gpt-4o",
|
||||
api_key=settings.openai_api_key,
|
||||
temperature=0.1
|
||||
)
|
||||
|
||||
# Configure embeddings
|
||||
Settings.embed_model = OpenAIEmbedding(
|
||||
model="text-embedding-3-small",
|
||||
api_key=settings.openai_api_key
|
||||
)
|
||||
|
||||
# Configure semantic text splitter
|
||||
Settings.text_splitter = SemanticSplitterNodeParser.from_defaults(
|
||||
embed_model=OpenAIEmbedding(
|
||||
model="text-embedding-3-small",
|
||||
api_key=settings.openai_api_key
|
||||
),
|
||||
buffer_size=2,
|
||||
breakpoint_percentile_threshold=70
|
||||
)
|
||||
|
||||
async def process_single_file(
|
||||
self,
|
||||
file: UploadFile,
|
||||
index_id: str,
|
||||
user: UserInDB,
|
||||
db: AsyncIOMotorDatabase,
|
||||
custom_name: Optional[str] = None
|
||||
) -> DocumentInDB:
|
||||
"""Process a single uploaded file"""
|
||||
# Validate file
|
||||
await self._validate_file(file)
|
||||
|
||||
# Generate unique filename
|
||||
file_extension = Path(file.filename).suffix.lower()
|
||||
unique_filename = f"{uuid.uuid4()}{file_extension}"
|
||||
|
||||
# Create index-specific upload directory
|
||||
index_upload_dir = self.upload_dir / index_id
|
||||
index_upload_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Save file
|
||||
file_path = index_upload_dir / unique_filename
|
||||
await self._save_file(file, file_path)
|
||||
|
||||
# Create document record
|
||||
document_name = custom_name or file.filename
|
||||
document_data = DocumentCreate(
|
||||
filename=unique_filename,
|
||||
original_filename=document_name,
|
||||
file_size=file.size,
|
||||
content_type=file.content_type,
|
||||
index_id=index_id,
|
||||
uploaded_by=user.id
|
||||
)
|
||||
|
||||
document_dict = document_data.dict()
|
||||
document_dict["file_path"] = str(file_path)
|
||||
document_dict["processing_status"] = "pending"
|
||||
document_dict["embedding_status"] = "pending"
|
||||
document_dict["summary_status"] = "pending"
|
||||
document_dict["metadata"] = {}
|
||||
document_dict["created_at"] = datetime.utcnow()
|
||||
document_dict["updated_at"] = datetime.utcnow()
|
||||
|
||||
# Save to database
|
||||
result = await db.documents.insert_one(document_dict)
|
||||
document_dict["_id"] = result.inserted_id
|
||||
|
||||
document = DocumentInDB(**document_dict)
|
||||
|
||||
# Process document asynchronously
|
||||
asyncio.create_task(self._process_document_async(document, db))
|
||||
|
||||
return document
|
||||
|
||||
async def process_multiple_files(
|
||||
self,
|
||||
files: List[UploadFile],
|
||||
index_id: str,
|
||||
user: UserInDB,
|
||||
db: AsyncIOMotorDatabase,
|
||||
base_name: str
|
||||
) -> List[DocumentInDB]:
|
||||
"""Process multiple uploaded files"""
|
||||
documents = []
|
||||
|
||||
for i, file in enumerate(files, 1):
|
||||
# Generate custom name with serial number
|
||||
file_extension = Path(file.filename).suffix.lower()
|
||||
custom_name = f"{base_name}_{i:03d}{file_extension}"
|
||||
|
||||
document = await self.process_single_file(
|
||||
file, index_id, user, db, custom_name
|
||||
)
|
||||
documents.append(document)
|
||||
|
||||
return documents
|
||||
|
||||
async def _process_document_async(self, document: DocumentInDB, db: AsyncIOMotorDatabase):
|
||||
"""Process document asynchronously"""
|
||||
print(f"Starting processing for document {document.id}: {document.original_filename}")
|
||||
|
||||
try:
|
||||
# Update status to processing
|
||||
print(f"Setting document {document.id} to processing status")
|
||||
await self._update_document_status(
|
||||
document.id, "processing", "pending", "pending", db
|
||||
)
|
||||
|
||||
# Small delay to ensure status update is committed
|
||||
import asyncio
|
||||
await asyncio.sleep(0.1)
|
||||
|
||||
# Parse document text
|
||||
print(f"Parsing document text for {document.id}")
|
||||
parsed_text = await self._parse_document_text(document.file_path)
|
||||
print(f"Parsed text length: {len(parsed_text)} characters")
|
||||
|
||||
# Update parsing status
|
||||
await self._update_document_status(
|
||||
document.id, "completed", "processing", "pending", db
|
||||
)
|
||||
|
||||
# Create text chunks
|
||||
print(f"Creating text chunks for {document.id}")
|
||||
chunks = await self._create_text_chunks(parsed_text, document.index_id)
|
||||
print(f"Created {len(chunks)} chunks")
|
||||
|
||||
# Create embeddings and store in vector database
|
||||
print(f"Creating embeddings for {document.id}")
|
||||
vector_ids = await self._create_embeddings(
|
||||
chunks, document.index_id, str(document.id)
|
||||
)
|
||||
print(f"Created {len(vector_ids)} embeddings")
|
||||
|
||||
# Update document with parsed data
|
||||
print(f"Updating document {document.id} with parsed data")
|
||||
await self._update_document_with_parsed_data(
|
||||
document.id, parsed_text, chunks, vector_ids, db
|
||||
)
|
||||
|
||||
# Update status to completed (core processing done)
|
||||
print(f"Completing processing for document {document.id}")
|
||||
await self._update_document_status(
|
||||
document.id, "completed", "completed", "pending", db
|
||||
)
|
||||
|
||||
# Extract contract summary (non-blocking)
|
||||
print(f"Extracting contract summary for {document.id}")
|
||||
try:
|
||||
await self._extract_contract_summary(
|
||||
document.id, parsed_text, document.original_filename, db
|
||||
)
|
||||
except Exception as summary_error:
|
||||
print(f"Warning: Contract summary extraction failed for {document.id}: {summary_error}")
|
||||
# Mark summary as failed but don't fail the entire document
|
||||
await self._update_document_status(
|
||||
document.id, "completed", "completed", "failed", db
|
||||
)
|
||||
|
||||
print(f"Successfully processed document {document.id}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error processing document {document.id}: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
await self._update_document_status(
|
||||
document.id, "failed", "failed", "failed", db
|
||||
)
|
||||
# Store error in metadata
|
||||
await db.documents.update_one(
|
||||
{"_id": ObjectId(document.id)},
|
||||
{"$set": {"metadata.error": str(e)}}
|
||||
)
|
||||
|
||||
async def _parse_document_text(self, file_path: str) -> str:
|
||||
"""Parse text from document using LlamaParse with premium mode (async)"""
|
||||
file_path = Path(file_path)
|
||||
|
||||
print(f"Parsing file: {file_path}")
|
||||
print(f"File exists: {file_path.exists()}")
|
||||
print(f"File size: {file_path.stat().st_size if file_path.exists() else 'N/A'}")
|
||||
|
||||
# Check if LlamaParse API key is available
|
||||
if not settings.llamaparse_api_key:
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail="LlamaParse API key is required for document processing"
|
||||
)
|
||||
|
||||
try:
|
||||
print("Using LlamaParse with premium mode (async)")
|
||||
|
||||
# Run LlamaParse in a thread pool to avoid blocking the event loop
|
||||
def _run_llamaparse():
|
||||
parser = LlamaParse(
|
||||
api_key=settings.llamaparse_api_key,
|
||||
premium_mode=True,
|
||||
result_type="markdown",
|
||||
verbose=True
|
||||
)
|
||||
return parser.load_data(str(file_path))
|
||||
|
||||
# Execute the synchronous LlamaParse call in a thread pool
|
||||
import asyncio
|
||||
loop = asyncio.get_event_loop()
|
||||
documents = await loop.run_in_executor(None, _run_llamaparse)
|
||||
|
||||
print(f"LlamaParse loaded {len(documents)} documents from file")
|
||||
|
||||
# Combine all document text
|
||||
full_text = ""
|
||||
for doc in documents:
|
||||
full_text += doc.text + "\n\n"
|
||||
|
||||
text_result = full_text.strip()
|
||||
print(f"Final parsed text length: {len(text_result)} characters")
|
||||
|
||||
if len(text_result) == 0:
|
||||
raise Exception("No text extracted from document via LlamaParse")
|
||||
|
||||
return text_result
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error in _parse_document_text with LlamaParse: {str(e)}")
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail=f"Error parsing document with LlamaParse: {str(e)}"
|
||||
)
|
||||
|
||||
async def _create_text_chunks(self, text: str, index_id: str) -> List[str]:
|
||||
"""Create text chunks using LlamaIndex SemanticSplitter"""
|
||||
try:
|
||||
print(f"Creating semantic chunks for text of length {len(text)}")
|
||||
|
||||
# Create semantic splitter with OpenAI embeddings
|
||||
semantic_splitter = SemanticSplitterNodeParser.from_defaults(
|
||||
embed_model=OpenAIEmbedding(
|
||||
model="text-embedding-3-small",
|
||||
api_key=settings.openai_api_key
|
||||
),
|
||||
buffer_size=2,
|
||||
breakpoint_percentile_threshold=70
|
||||
)
|
||||
|
||||
# Create a document and split it semantically
|
||||
llama_doc = LlamaDocument(text=text)
|
||||
nodes = semantic_splitter.get_nodes_from_documents([llama_doc])
|
||||
|
||||
# Extract text from nodes
|
||||
chunks = [node.text for node in nodes]
|
||||
|
||||
print(f"Created {len(chunks)} semantic chunks")
|
||||
|
||||
if len(chunks) == 0:
|
||||
raise Exception("No semantic chunks created from text")
|
||||
|
||||
return chunks
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error in _create_text_chunks: {str(e)}")
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail=f"Error creating semantic text chunks: {str(e)}"
|
||||
)
|
||||
|
||||
async def _create_embeddings(
|
||||
self,
|
||||
chunks: List[str],
|
||||
index_id: str,
|
||||
document_id: str
|
||||
) -> List[str]:
|
||||
"""Create embeddings and store using LlamaIndex"""
|
||||
try:
|
||||
collection_name = f"index_{index_id}"
|
||||
print(f"🔍 DEBUG - Creating embeddings using LlamaIndex for collection: {collection_name}")
|
||||
|
||||
# Create LlamaIndex documents from chunks
|
||||
documents = []
|
||||
vector_ids = []
|
||||
|
||||
for i, chunk in enumerate(chunks):
|
||||
chunk_id = f"{document_id}_{i}"
|
||||
vector_ids.append(chunk_id)
|
||||
|
||||
# Create LlamaIndex document with metadata
|
||||
doc = LlamaDocument(
|
||||
text=chunk,
|
||||
metadata={
|
||||
"document_id": document_id,
|
||||
"chunk_index": i,
|
||||
"index_id": index_id,
|
||||
"chunk_id": chunk_id
|
||||
}
|
||||
)
|
||||
documents.append(doc)
|
||||
|
||||
print(f" - Created {len(documents)} LlamaIndex documents")
|
||||
print(f" - Document IDs: {vector_ids}")
|
||||
print(f" - Document lengths: {[len(doc.text) for doc in documents]}")
|
||||
|
||||
# Get or create ChromaDB collection using LlamaIndex
|
||||
client = self.get_chroma_client()
|
||||
try:
|
||||
chroma_collection = client.get_collection(name=collection_name)
|
||||
current_count = chroma_collection.count()
|
||||
print(f" - Found existing collection with {current_count} vectors")
|
||||
except Exception:
|
||||
# Create collection - let LlamaIndex handle the embedding function
|
||||
print(f" - Creating new ChromaDB collection")
|
||||
chroma_collection = client.create_collection(
|
||||
name=collection_name,
|
||||
metadata={"hnsw:space": "cosine"}
|
||||
)
|
||||
|
||||
# Create LlamaIndex vector store and index
|
||||
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
|
||||
storage_context = StorageContext.from_defaults(vector_store=vector_store)
|
||||
|
||||
# Check if index exists
|
||||
try:
|
||||
# Try to load existing index
|
||||
index = VectorStoreIndex.from_vector_store(
|
||||
vector_store=vector_store,
|
||||
storage_context=storage_context
|
||||
)
|
||||
print(f" - Loaded existing LlamaIndex VectorStoreIndex")
|
||||
except Exception:
|
||||
# Create new index
|
||||
index = VectorStoreIndex.from_documents(
|
||||
[], # Start empty
|
||||
storage_context=storage_context
|
||||
)
|
||||
print(f" - Created new LlamaIndex VectorStoreIndex")
|
||||
|
||||
# Insert documents into the index (async to avoid blocking)
|
||||
print(f" - Inserting {len(documents)} documents into LlamaIndex")
|
||||
|
||||
def _insert_documents():
|
||||
for doc in documents:
|
||||
print(f" - Inserting document chunk with metadata: {doc.metadata}")
|
||||
index.insert(doc)
|
||||
return True
|
||||
|
||||
# Run embedding creation in thread pool to avoid blocking
|
||||
await asyncio.get_event_loop().run_in_executor(None, _insert_documents)
|
||||
|
||||
# Verify the final count
|
||||
final_count = chroma_collection.count()
|
||||
print(f" - Collection count after adding: {final_count}")
|
||||
print(f" - Successfully added {len(vector_ids)} vectors for document {document_id}")
|
||||
|
||||
return vector_ids
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error in _create_embeddings: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail=f"Error creating embeddings: {str(e)}"
|
||||
)
|
||||
|
||||
async def _update_document_status(
|
||||
self,
|
||||
document_id: str,
|
||||
processing_status: str,
|
||||
embedding_status: str,
|
||||
summary_status: str,
|
||||
db: AsyncIOMotorDatabase
|
||||
):
|
||||
"""Update document processing status"""
|
||||
await db.documents.update_one(
|
||||
{"_id": ObjectId(document_id)},
|
||||
{"$set": {
|
||||
"processing_status": processing_status,
|
||||
"embedding_status": embedding_status,
|
||||
"summary_status": summary_status,
|
||||
"updated_at": datetime.utcnow()
|
||||
}}
|
||||
)
|
||||
|
||||
async def _update_document_with_parsed_data(
|
||||
self,
|
||||
document_id: str,
|
||||
parsed_text: str,
|
||||
chunks: List[str],
|
||||
vector_ids: List[str],
|
||||
db: AsyncIOMotorDatabase
|
||||
):
|
||||
"""Update document with parsed data"""
|
||||
await db.documents.update_one(
|
||||
{"_id": ObjectId(document_id)},
|
||||
{"$set": {
|
||||
"parsed_text": parsed_text,
|
||||
"text_chunks": chunks,
|
||||
"chunk_count": len(chunks),
|
||||
"vector_ids": vector_ids,
|
||||
"updated_at": datetime.utcnow()
|
||||
}}
|
||||
)
|
||||
|
||||
async def _extract_contract_summary(
|
||||
self,
|
||||
document_id: str,
|
||||
parsed_text: str,
|
||||
filename: str,
|
||||
db: AsyncIOMotorDatabase
|
||||
):
|
||||
"""Extract contract summary asynchronously"""
|
||||
try:
|
||||
# Update summary status to processing
|
||||
await db.documents.update_one(
|
||||
{"_id": ObjectId(document_id)},
|
||||
{"$set": {
|
||||
"summary_status": "processing",
|
||||
"updated_at": datetime.utcnow()
|
||||
}}
|
||||
)
|
||||
|
||||
# Extract summary using the contract summary service
|
||||
result = await contract_summary_service.extract_contract_summary(
|
||||
parsed_text, filename
|
||||
)
|
||||
|
||||
if result["success"]:
|
||||
# Validate the summary
|
||||
validated_summary = contract_summary_service.validate_contract_summary(
|
||||
result["summary"]
|
||||
)
|
||||
|
||||
# Store in database
|
||||
await db.documents.update_one(
|
||||
{"_id": ObjectId(document_id)},
|
||||
{"$set": {
|
||||
"contract_summary": validated_summary.dict(),
|
||||
"summary_status": "completed",
|
||||
"summary_created_at": datetime.utcnow(),
|
||||
"updated_at": datetime.utcnow()
|
||||
}}
|
||||
)
|
||||
|
||||
print(f"Successfully extracted contract summary for {document_id}")
|
||||
|
||||
else:
|
||||
# Store error
|
||||
await db.documents.update_one(
|
||||
{"_id": ObjectId(document_id)},
|
||||
{"$set": {
|
||||
"summary_status": "failed",
|
||||
"metadata.summary_error": result.get("error", "Unknown error"),
|
||||
"updated_at": datetime.utcnow()
|
||||
}}
|
||||
)
|
||||
|
||||
print(f"Failed to extract contract summary for {document_id}: {result.get('error')}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error extracting contract summary for {document_id}: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
await db.documents.update_one(
|
||||
{"_id": ObjectId(document_id)},
|
||||
{"$set": {
|
||||
"summary_status": "failed",
|
||||
"metadata.summary_error": str(e),
|
||||
"updated_at": datetime.utcnow()
|
||||
}}
|
||||
)
|
||||
|
||||
async def _validate_file(self, file: UploadFile):
|
||||
"""Validate uploaded file"""
|
||||
if not file.filename:
|
||||
raise HTTPException(status_code=400, detail="No file provided")
|
||||
|
||||
# Check file extension
|
||||
file_extension = Path(file.filename).suffix.lower()
|
||||
if file_extension not in self.allowed_extensions:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=f"File type {file_extension} not supported. Allowed types: {', '.join(self.allowed_extensions)}"
|
||||
)
|
||||
|
||||
# Check file size
|
||||
if file.size > self.max_file_size:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=f"File too large. Maximum size: {self.max_file_size / (1024*1024):.1f}MB"
|
||||
)
|
||||
|
||||
async def _save_file(self, file: UploadFile, file_path: Path):
|
||||
"""Save uploaded file to disk"""
|
||||
try:
|
||||
async with aiofiles.open(file_path, 'wb') as f:
|
||||
content = await file.read()
|
||||
await f.write(content)
|
||||
except Exception as e:
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail=f"Error saving file: {str(e)}"
|
||||
)
|
||||
|
||||
async def query_documents(
|
||||
self,
|
||||
query: str,
|
||||
index_id: str,
|
||||
top_k: int = 10
|
||||
) -> List[Dict[str, Any]]:
|
||||
"""Query documents using LlamaIndex VectorStoreIndex"""
|
||||
try:
|
||||
collection_name = f"index_{index_id}"
|
||||
|
||||
# Ensure consistent embedding model for LlamaIndex
|
||||
Settings.embed_model = OpenAIEmbedding(
|
||||
model="text-embedding-3-small",
|
||||
api_key=settings.openai_api_key
|
||||
)
|
||||
|
||||
# DEBUG: LlamaIndex query setup
|
||||
print(f"🔍 DEBUG - LlamaIndex Query Execution:")
|
||||
print(f" - Query: '{query}'")
|
||||
print(f" - Collection name: {collection_name}")
|
||||
print(f" - Top K: {top_k}")
|
||||
print(f" - Settings.embed_model: {type(Settings.embed_model).__name__}")
|
||||
|
||||
# Test embedding dimensions
|
||||
try:
|
||||
test_embedding = Settings.embed_model.get_text_embedding("test")
|
||||
print(f" - Query embedding dimensions: {len(test_embedding)}")
|
||||
except Exception as e:
|
||||
print(f" - ERROR getting test embedding: {e}")
|
||||
|
||||
# Check if collection exists (without specifying embedding function)
|
||||
try:
|
||||
client = self.get_chroma_client()
|
||||
chroma_collection = client.get_collection(name=collection_name)
|
||||
collection_count = chroma_collection.count()
|
||||
print(f" - Found collection with {collection_count} documents")
|
||||
|
||||
if collection_count == 0:
|
||||
raise HTTPException(
|
||||
status_code=404,
|
||||
detail=f"Index '{index_id}' exists but contains no processed documents. Please upload and process documents first."
|
||||
)
|
||||
except Exception as collection_error:
|
||||
print(f" - Collection {collection_name} not found: {collection_error}")
|
||||
raise HTTPException(
|
||||
status_code=404,
|
||||
detail=f"Vector collection for index '{index_id}' does not exist. Documents may still be processing or failed to process. Please check the admin panel for processing status."
|
||||
)
|
||||
|
||||
# Create LlamaIndex VectorStore and Index
|
||||
try:
|
||||
print(f" - Creating LlamaIndex VectorStore...")
|
||||
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
|
||||
storage_context = StorageContext.from_defaults(vector_store=vector_store)
|
||||
|
||||
# Load the index
|
||||
index = VectorStoreIndex.from_vector_store(
|
||||
vector_store=vector_store,
|
||||
storage_context=storage_context
|
||||
)
|
||||
|
||||
# Create query engine
|
||||
query_engine = index.as_query_engine(
|
||||
similarity_top_k=top_k,
|
||||
response_mode="no_text" # Only return source nodes, no generated text
|
||||
)
|
||||
|
||||
print(f" - Executing LlamaIndex query...")
|
||||
response = query_engine.query(query)
|
||||
print(f" - Query successful")
|
||||
|
||||
# Extract and format results from source nodes
|
||||
formatted_results = []
|
||||
if hasattr(response, 'source_nodes') and response.source_nodes:
|
||||
for i, node in enumerate(response.source_nodes):
|
||||
# Get similarity score (higher is better)
|
||||
similarity_score = node.score if hasattr(node, 'score') and node.score is not None else 0.5
|
||||
|
||||
# Convert similarity score to distance (lower is better)
|
||||
# LlamaIndex typically returns scores between 0 and 1, where 1 is most similar
|
||||
distance = 1.0 - similarity_score
|
||||
|
||||
formatted_results.append({
|
||||
"content": node.text,
|
||||
"metadata": node.metadata,
|
||||
"score": similarity_score,
|
||||
"distance": distance,
|
||||
"document_id": node.metadata.get("document_id", "unknown"),
|
||||
"chunk_index": node.metadata.get("chunk_index", i)
|
||||
})
|
||||
print(f" - Retrieved {len(formatted_results)} relevant chunks")
|
||||
print(f" - Similarity scores: {[r['score'] for r in formatted_results]}")
|
||||
print(f" - Distance values: {[r['distance'] for r in formatted_results]}")
|
||||
else:
|
||||
print(f" - No source nodes found in response")
|
||||
|
||||
except Exception as query_error:
|
||||
print(f" - ERROR during LlamaIndex query: {query_error}")
|
||||
print(f" - Error type: {type(query_error).__name__}")
|
||||
raise query_error
|
||||
|
||||
if not formatted_results:
|
||||
raise HTTPException(
|
||||
status_code=404,
|
||||
detail=f"No relevant documents found for query '{query}' in index '{index_id}'. Try rephrasing your question or ensure documents are properly processed."
|
||||
)
|
||||
|
||||
return formatted_results
|
||||
|
||||
except HTTPException:
|
||||
# Re-raise HTTPExceptions as-is
|
||||
raise
|
||||
except Exception as e:
|
||||
print(f"Unexpected error in query_documents: {e}")
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail=f"Error querying documents: {str(e)}"
|
||||
)
|
||||
|
||||
def generate_response(
|
||||
self,
|
||||
query: str,
|
||||
context_chunks: List[str],
|
||||
index_id: str
|
||||
) -> str:
|
||||
"""Generate response using OpenAI with retrieved context"""
|
||||
try:
|
||||
# Prepare context
|
||||
context = "\n\n".join(context_chunks)
|
||||
|
||||
# Create prompt
|
||||
prompt = f"""Based on the following context, answer the user's question. If the answer is not in the context, say "I don't have enough information to answer that question."
|
||||
|
||||
Context:
|
||||
{context}
|
||||
|
||||
Question: {query}
|
||||
|
||||
Answer:"""
|
||||
|
||||
# Generate response using OpenAI
|
||||
llm = OpenAI(
|
||||
model="gpt-4o",
|
||||
api_key=settings.openai_api_key,
|
||||
temperature=0.1
|
||||
)
|
||||
|
||||
# Use sync completion
|
||||
response = llm.complete(prompt)
|
||||
|
||||
return response.text
|
||||
|
||||
except Exception as e:
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail=f"Error generating response: {str(e)}"
|
||||
)
|
||||
|
||||
async def delete_document_embeddings(
|
||||
self,
|
||||
document_id: str,
|
||||
index_id: str
|
||||
):
|
||||
"""Delete document embeddings from vector store"""
|
||||
try:
|
||||
collection_name = f"index_{index_id}"
|
||||
print(f"🗑️ DEBUG - Deleting embeddings for document {document_id} from collection {collection_name}")
|
||||
|
||||
collection = self.get_chroma_client().get_collection(name=collection_name)
|
||||
|
||||
# Get collection count before deletion
|
||||
count_before = collection.count()
|
||||
print(f" - Collection count before deletion: {count_before}")
|
||||
|
||||
# Strategy 1: Try to delete by document_id metadata
|
||||
try:
|
||||
results = collection.get(where={"document_id": document_id})
|
||||
print(f" - Strategy 1 - Found {len(results['ids']) if results['ids'] else 0} vectors by document_id")
|
||||
|
||||
if results["ids"]:
|
||||
collection.delete(ids=results["ids"])
|
||||
count_after = collection.count()
|
||||
print(f" - Strategy 1 - Successfully deleted {count_before - count_after} vectors")
|
||||
if count_before != count_after:
|
||||
return # Success, exit early
|
||||
except Exception as e:
|
||||
print(f" - Strategy 1 failed: {e}")
|
||||
|
||||
# Strategy 2: Try to delete by chunk_id pattern (document_id_*)
|
||||
try:
|
||||
# Get all vectors and filter by chunk_id pattern
|
||||
all_results = collection.get()
|
||||
matching_ids = []
|
||||
|
||||
if all_results["ids"] and all_results["metadatas"]:
|
||||
for vid, metadata in zip(all_results["ids"], all_results["metadatas"]):
|
||||
if metadata and "chunk_id" in metadata:
|
||||
if metadata["chunk_id"].startswith(f"{document_id}_"):
|
||||
matching_ids.append(vid)
|
||||
|
||||
print(f" - Strategy 2 - Found {len(matching_ids)} vectors by chunk_id pattern")
|
||||
|
||||
if matching_ids:
|
||||
collection.delete(ids=matching_ids)
|
||||
count_after = collection.count()
|
||||
print(f" - Strategy 2 - Successfully deleted {count_before - count_after} vectors")
|
||||
if count_before != count_after:
|
||||
return # Success, exit early
|
||||
except Exception as e:
|
||||
print(f" - Strategy 2 failed: {e}")
|
||||
|
||||
# Strategy 3: Try to delete by any metadata containing document_id
|
||||
try:
|
||||
all_results = collection.get()
|
||||
matching_ids = []
|
||||
|
||||
if all_results["ids"] and all_results["metadatas"]:
|
||||
for vid, metadata in zip(all_results["ids"], all_results["metadatas"]):
|
||||
if metadata:
|
||||
# Check if any metadata value contains the document_id
|
||||
for key, value in metadata.items():
|
||||
if str(value) == document_id:
|
||||
matching_ids.append(vid)
|
||||
break
|
||||
|
||||
print(f" - Strategy 3 - Found {len(matching_ids)} vectors by metadata scan")
|
||||
|
||||
if matching_ids:
|
||||
collection.delete(ids=matching_ids)
|
||||
count_after = collection.count()
|
||||
print(f" - Strategy 3 - Successfully deleted {count_before - count_after} vectors")
|
||||
if count_before != count_after:
|
||||
return # Success, exit early
|
||||
except Exception as e:
|
||||
print(f" - Strategy 3 failed: {e}")
|
||||
|
||||
# If we get here, no vectors were deleted
|
||||
print(f" - WARNING: No vectors were deleted for document {document_id}")
|
||||
|
||||
# Debug: Show some sample metadata to understand the structure
|
||||
try:
|
||||
sample_results = collection.get(limit=3)
|
||||
print(f" - DEBUG: Sample metadata structures:")
|
||||
for i, metadata in enumerate(sample_results.get("metadatas", [])[:3]):
|
||||
print(f" - Sample {i}: {metadata}")
|
||||
except Exception as e:
|
||||
print(f" - DEBUG: Could not get sample metadata: {e}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error deleting embeddings for document {document_id}: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
def check_collection_exists(self, index_id: str) -> bool:
|
||||
"""Check if ChromaDB collection exists for an index"""
|
||||
try:
|
||||
collection_name = f"index_{index_id}"
|
||||
collection = self.get_chroma_client().get_collection(name=collection_name)
|
||||
return True
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
def get_collection_info(self, index_id: str) -> Dict[str, Any]:
|
||||
"""Get information about a ChromaDB collection"""
|
||||
try:
|
||||
collection_name = f"index_{index_id}"
|
||||
collection = self.get_chroma_client().get_collection(name=collection_name)
|
||||
|
||||
return {
|
||||
"exists": True,
|
||||
"name": collection_name,
|
||||
"count": collection.count(),
|
||||
"metadata": collection.metadata
|
||||
}
|
||||
except Exception as e:
|
||||
return {
|
||||
"exists": False,
|
||||
"name": f"index_{index_id}",
|
||||
"count": 0,
|
||||
"error": str(e)
|
||||
}
|
||||
|
||||
async def generate_document_summary(self, text: str, filename: str) -> str:
|
||||
"""Generate AI summary of a document"""
|
||||
try:
|
||||
# Check text length and raise error if too long
|
||||
if len(text) > settings.max_summary_chars:
|
||||
error_msg = f"Document too large for summary: {len(text)} characters exceeds maximum of {settings.max_summary_chars} characters"
|
||||
print(f"Error: {error_msg}")
|
||||
return f"Error: {error_msg}"
|
||||
|
||||
# Create summarization prompt
|
||||
prompt = f"""Please provide a concise summary of the following document "{filename}".
|
||||
Focus on the main points, key information, and important details.
|
||||
Keep the summary between 150-300 words.
|
||||
|
||||
Document content:
|
||||
{text}
|
||||
|
||||
Summary:"""
|
||||
|
||||
# Generate summary using OpenAI via Settings.llm
|
||||
response = await Settings.llm.acomplete(prompt)
|
||||
summary = str(response).strip()
|
||||
|
||||
# Fallback if summary is too short
|
||||
if len(summary) < 50:
|
||||
summary = "Unable to generate detailed summary. This document contains text that may require manual review."
|
||||
|
||||
return summary
|
||||
|
||||
except Exception as e:
|
||||
return f"Error generating summary: {str(e)}"
|
||||
|
||||
|
||||
# Global processor instance
|
||||
llama_processor = LlamaProcessor()
|
||||
303
backend/app/services/rag_service.py
Normal file
303
backend/app/services/rag_service.py
Normal file
|
|
@ -0,0 +1,303 @@
|
|||
import os
|
||||
import json
|
||||
import asyncio
|
||||
from typing import Dict, Any, List, Optional
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
|
||||
from llama_index.core import VectorStoreIndex, StorageContext, Settings
|
||||
from llama_index.vector_stores.chroma import ChromaVectorStore
|
||||
from llama_index.embeddings.openai import OpenAIEmbedding
|
||||
from llama_index.llms.openai import OpenAI
|
||||
import chromadb
|
||||
|
||||
from ..config.settings import settings
|
||||
from ..models.index import IndexInDB
|
||||
from ..core.chroma_client import chroma_singleton
|
||||
|
||||
class RAGService:
|
||||
def __init__(self):
|
||||
self.openai_api_key = settings.openai_api_key
|
||||
self.llamaparse_api_key = settings.llamaparse_api_key
|
||||
self.indices_dir = Path(settings.indices_dir)
|
||||
self.upload_dir = Path(settings.upload_dir)
|
||||
|
||||
# Configure LlamaIndex settings
|
||||
Settings.llm = OpenAI(
|
||||
model="gpt-4o",
|
||||
api_key=self.openai_api_key,
|
||||
temperature=0.1
|
||||
)
|
||||
|
||||
Settings.embed_model = OpenAIEmbedding(
|
||||
model="text-embedding-3-small",
|
||||
api_key=self.openai_api_key
|
||||
)
|
||||
|
||||
# Ensure directories exist
|
||||
self.indices_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
def get_chroma_client(self):
|
||||
"""Get or create ChromaDB client using shared singleton"""
|
||||
chroma_db_path = str(self.indices_dir / "chroma_db")
|
||||
return chroma_singleton.get_client(chroma_db_path)
|
||||
|
||||
# NOTE: Index creation is now handled by LlamaProcessor
|
||||
# This method is deprecated and should not be used
|
||||
|
||||
async def query_index(
|
||||
self,
|
||||
index_id: str,
|
||||
query: str,
|
||||
top_k: int = 10
|
||||
) -> Dict[str, Any]:
|
||||
"""Query an existing index"""
|
||||
try:
|
||||
index_dir = self.indices_dir / index_id
|
||||
if not index_dir.exists():
|
||||
return {
|
||||
"success": False,
|
||||
"message": f"Index {index_id} not found"
|
||||
}
|
||||
|
||||
# Ensure consistent embedding model before querying
|
||||
embedding_model = OpenAIEmbedding(
|
||||
model="text-embedding-3-small",
|
||||
api_key=self.openai_api_key
|
||||
)
|
||||
Settings.embed_model = embedding_model
|
||||
|
||||
# DEBUG: Log embedding model details
|
||||
print(f"🔍 DEBUG - Embedding Model Configuration:")
|
||||
print(f" - Model: text-embedding-3-small")
|
||||
print(f" - API Key present: {bool(self.openai_api_key)}")
|
||||
print(f" - Settings.embed_model: {type(Settings.embed_model).__name__}")
|
||||
|
||||
# Test embedding to get dimensions
|
||||
try:
|
||||
test_embedding = embedding_model.get_text_embedding("test")
|
||||
print(f" - Test embedding dimensions: {len(test_embedding)}")
|
||||
except Exception as e:
|
||||
print(f" - ERROR getting test embedding: {e}")
|
||||
|
||||
# Load index (use consistent collection naming)
|
||||
chroma_client = self.get_chroma_client()
|
||||
collection_name = f"index_{index_id}"
|
||||
|
||||
# DEBUG: ChromaDB collection info
|
||||
try:
|
||||
chroma_collection = chroma_client.get_collection(name=collection_name)
|
||||
collection_metadata = chroma_collection.metadata
|
||||
collection_count = chroma_collection.count()
|
||||
print(f"🔍 DEBUG - ChromaDB Collection:")
|
||||
print(f" - Collection name: {collection_name}")
|
||||
print(f" - Document count: {collection_count}")
|
||||
print(f" - Collection metadata: {collection_metadata}")
|
||||
|
||||
# Try to peek at a few vectors to check dimensions
|
||||
if collection_count > 0:
|
||||
peek_result = chroma_collection.peek(limit=1)
|
||||
if peek_result and 'embeddings' in peek_result and peek_result['embeddings']:
|
||||
stored_dim = len(peek_result['embeddings'][0]) if peek_result['embeddings'][0] else "None"
|
||||
print(f" - Stored vector dimensions: {stored_dim}")
|
||||
else:
|
||||
print(f" - No embeddings found in peek result")
|
||||
except Exception as e:
|
||||
print(f" - ERROR accessing collection: {e}")
|
||||
raise e
|
||||
|
||||
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
|
||||
storage_context = StorageContext.from_defaults(vector_store=vector_store)
|
||||
|
||||
# Load the index
|
||||
index = VectorStoreIndex.from_vector_store(
|
||||
vector_store=vector_store,
|
||||
storage_context=storage_context
|
||||
)
|
||||
|
||||
# Create query engine
|
||||
query_engine = index.as_query_engine(
|
||||
similarity_top_k=top_k,
|
||||
response_mode="compact"
|
||||
)
|
||||
|
||||
# DEBUG: Query execution details
|
||||
print(f"🔍 DEBUG - Query Execution:")
|
||||
print(f" - Query: '{query}'")
|
||||
print(f" - Top K: {top_k}")
|
||||
print(f" - Current Settings.embed_model: {type(Settings.embed_model).__name__}")
|
||||
|
||||
# Test query embedding before execution
|
||||
try:
|
||||
query_embedding = Settings.embed_model.get_text_embedding(query)
|
||||
print(f" - Query embedding dimensions: {len(query_embedding)}")
|
||||
except Exception as e:
|
||||
print(f" - ERROR getting query embedding: {e}")
|
||||
|
||||
# Execute query
|
||||
start_time = datetime.now()
|
||||
print(f" - Starting query execution at {start_time}")
|
||||
try:
|
||||
response = query_engine.query(query)
|
||||
print(f" - Query executed successfully")
|
||||
except Exception as e:
|
||||
print(f" - ERROR during query execution: {e}")
|
||||
print(f" - Error type: {type(e).__name__}")
|
||||
raise e
|
||||
end_time = datetime.now()
|
||||
|
||||
# Extract source information
|
||||
source_info = []
|
||||
if hasattr(response, 'source_nodes'):
|
||||
for node in response.source_nodes:
|
||||
source_info.append({
|
||||
"filename": node.metadata.get('filename', 'Unknown'),
|
||||
"score": node.score,
|
||||
"text_snippet": node.text[:200] + "..." if len(node.text) > 200 else node.text
|
||||
})
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"response": str(response),
|
||||
"sources": source_info,
|
||||
"query_time": (end_time - start_time).total_seconds(),
|
||||
"debug": {
|
||||
"query": query,
|
||||
"index_id": index_id,
|
||||
"top_k": top_k,
|
||||
"source_count": len(source_info)
|
||||
}
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error querying index: {e}")
|
||||
return {
|
||||
"success": False,
|
||||
"message": f"Error querying index: {str(e)}"
|
||||
}
|
||||
|
||||
# NOTE: Document loading is now handled by LlamaProcessor
|
||||
# This method is deprecated and should not be used
|
||||
|
||||
# NOTE: LlamaParse processing is now handled by LlamaProcessor
|
||||
# This method is deprecated and should not be used
|
||||
|
||||
# NOTE: Document reading is now handled by LlamaProcessor
|
||||
# This method is deprecated and should not be used
|
||||
|
||||
# NOTE: Document addition to index is now handled by LlamaProcessor
|
||||
# This method is deprecated and should not be used
|
||||
|
||||
async def delete_index(self, index_id: str) -> bool:
|
||||
"""Delete an index and all associated files"""
|
||||
try:
|
||||
index_dir = self.indices_dir / index_id
|
||||
if index_dir.exists():
|
||||
shutil.rmtree(index_dir)
|
||||
return True
|
||||
return False
|
||||
except Exception as e:
|
||||
print(f"Error deleting index: {e}")
|
||||
return False
|
||||
|
||||
async def delete_index_complete(self, index_id: str) -> Dict[str, Any]:
|
||||
"""Complete index deletion including ChromaDB cleanup"""
|
||||
try:
|
||||
# Delete vector index files
|
||||
file_success = await self.delete_index(index_id)
|
||||
|
||||
# Delete ChromaDB collection
|
||||
chroma_client = self.get_chroma_client()
|
||||
collection_name = f"index_{index_id}"
|
||||
|
||||
collection_deleted = False
|
||||
try:
|
||||
chroma_client.delete_collection(collection_name)
|
||||
collection_deleted = True
|
||||
print(f"Successfully deleted ChromaDB collection: {collection_name}")
|
||||
except Exception as e:
|
||||
print(f"Warning: Could not delete ChromaDB collection {collection_name}: {e}")
|
||||
|
||||
# Clean up orphaned metadata in shared ChromaDB database
|
||||
metadata_cleaned = self._cleanup_chromadb_metadata(index_id)
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"message": "Index completely deleted",
|
||||
"details": {
|
||||
"files_deleted": file_success,
|
||||
"collection_deleted": collection_deleted,
|
||||
"metadata_cleaned": metadata_cleaned
|
||||
}
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
return {
|
||||
"success": False,
|
||||
"message": f"Error during complete index deletion: {str(e)}"
|
||||
}
|
||||
|
||||
def _cleanup_chromadb_metadata(self, index_id: str) -> bool:
|
||||
"""Clean up orphaned metadata in ChromaDB SQLite database for specific index"""
|
||||
import sqlite3
|
||||
|
||||
chroma_db_path = str(self.indices_dir / "chroma_db" / "chroma.sqlite3")
|
||||
collection_name = f"index_{index_id}"
|
||||
|
||||
try:
|
||||
with sqlite3.connect(chroma_db_path) as conn:
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Get the collection_id first
|
||||
cursor.execute("""
|
||||
SELECT id FROM collections WHERE name = ?
|
||||
""", (collection_name,))
|
||||
collection_result = cursor.fetchone()
|
||||
|
||||
if collection_result:
|
||||
collection_id = collection_result[0]
|
||||
|
||||
# Delete embedding metadata for this specific collection
|
||||
cursor.execute("""
|
||||
DELETE FROM embedding_metadata
|
||||
WHERE id IN (
|
||||
SELECT em.id FROM embedding_metadata em
|
||||
JOIN embeddings e ON em.id = e.id
|
||||
WHERE e.collection_id = ?
|
||||
)
|
||||
""", (collection_id,))
|
||||
|
||||
metadata_count = cursor.rowcount
|
||||
|
||||
# Delete embeddings for this collection
|
||||
cursor.execute("""
|
||||
DELETE FROM embeddings
|
||||
WHERE collection_id = ?
|
||||
""", (collection_id,))
|
||||
|
||||
embedding_count = cursor.rowcount
|
||||
|
||||
# Delete the collection record
|
||||
cursor.execute("""
|
||||
DELETE FROM collections
|
||||
WHERE id = ?
|
||||
""", (collection_id,))
|
||||
|
||||
conn.commit()
|
||||
print(f"Cleaned up ChromaDB metadata for {collection_name}: {metadata_count} metadata entries, {embedding_count} embeddings")
|
||||
return True
|
||||
else:
|
||||
print(f"No collection found with name {collection_name}")
|
||||
return True # Not an error if collection doesn't exist
|
||||
|
||||
except Exception as e:
|
||||
print(f"Warning: Could not clean up ChromaDB metadata for {collection_name}: {e}")
|
||||
return False
|
||||
|
||||
def index_exists(self, index_id: str) -> bool:
|
||||
"""Check if an index exists"""
|
||||
index_dir = self.indices_dir / index_id
|
||||
return index_dir.exists()
|
||||
|
||||
# Global RAG service instance
|
||||
rag_service = RAGService()
|
||||
234
backend/app/services/sso_service.py
Normal file
234
backend/app/services/sso_service.py
Normal file
|
|
@ -0,0 +1,234 @@
|
|||
"""
|
||||
SSO Service for Azure AD token validation and user management
|
||||
"""
|
||||
import httpx
|
||||
import json
|
||||
from typing import Optional, Dict, Any
|
||||
from datetime import datetime
|
||||
from jose import jwt, JWTError
|
||||
from fastapi import HTTPException, status
|
||||
from app.config.settings import settings
|
||||
from app.models.user import UserInDB, UserCreate, AuthMethod, UserRole
|
||||
from app.config.database import get_database
|
||||
from app.core.security import get_password_hash
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
class SSOService:
|
||||
def __init__(self):
|
||||
self.authority = settings.azure_authority
|
||||
self.client_id = settings.azure_client_id
|
||||
self.tenant_id = settings.azure_tenant_id
|
||||
self._discovery_cache = {}
|
||||
self._jwks_cache = {}
|
||||
|
||||
async def get_discovery_document(self) -> Dict[str, Any]:
|
||||
"""Get Azure AD OAuth2 endpoints for MSAL token validation"""
|
||||
if not self.authority:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
|
||||
detail="Azure AD authority not configured"
|
||||
)
|
||||
|
||||
# Standard Azure AD v2.0 endpoints for MSAL
|
||||
discovery_doc = {
|
||||
"authorization_endpoint": f"{self.authority}/oauth2/v2.0/authorize",
|
||||
"token_endpoint": f"{self.authority}/oauth2/v2.0/token",
|
||||
"jwks_uri": f"{self.authority}/discovery/v2.0/keys",
|
||||
"issuer": f"https://login.microsoftonline.com/{self.tenant_id}/v2.0"
|
||||
}
|
||||
|
||||
return discovery_doc
|
||||
|
||||
async def get_jwks(self) -> Dict[str, Any]:
|
||||
"""Get JSON Web Key Set from Azure AD"""
|
||||
discovery_doc = await self.get_discovery_document()
|
||||
jwks_uri = discovery_doc.get("jwks_uri")
|
||||
|
||||
if not jwks_uri:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
|
||||
detail="JWKS URI not found in discovery document"
|
||||
)
|
||||
|
||||
if jwks_uri in self._jwks_cache:
|
||||
return self._jwks_cache[jwks_uri]
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient() as client:
|
||||
response = await client.get(jwks_uri)
|
||||
response.raise_for_status()
|
||||
jwks = response.json()
|
||||
self._jwks_cache[jwks_uri] = jwks
|
||||
return jwks
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to get JWKS: {e}")
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
|
||||
detail="Failed to validate token: Cannot retrieve signing keys"
|
||||
)
|
||||
|
||||
async def validate_token(self, token: str) -> Dict[str, Any]:
|
||||
"""Validate Azure AD access token"""
|
||||
try:
|
||||
# Log token details for debugging
|
||||
logger.info(f"Validating token...")
|
||||
unverified_payload = jwt.get_unverified_claims(token)
|
||||
logger.info(f"Token payload: {unverified_payload}")
|
||||
logger.info(f"Token issuer: {unverified_payload.get('iss')}")
|
||||
logger.info(f"Token audience: {unverified_payload.get('aud')}")
|
||||
logger.info(f"Expected audience: {self.client_id}")
|
||||
logger.info(f"Expected issuer: https://login.microsoftonline.com/{self.tenant_id}/v2.0")
|
||||
|
||||
# Get JWKS for token validation
|
||||
jwks = await self.get_jwks()
|
||||
|
||||
# Decode token header to get key ID
|
||||
unverified_header = jwt.get_unverified_header(token)
|
||||
kid = unverified_header.get("kid")
|
||||
|
||||
if not kid:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="Token missing key ID"
|
||||
)
|
||||
|
||||
# Find the correct key
|
||||
key = None
|
||||
for jwk in jwks.get("keys", []):
|
||||
if jwk.get("kid") == kid:
|
||||
key = jwk
|
||||
break
|
||||
|
||||
if not key:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="Token key not found in JWKS"
|
||||
)
|
||||
|
||||
# Validate MSAL ID token with proper v2.0 validation
|
||||
payload = jwt.decode(
|
||||
token,
|
||||
key,
|
||||
algorithms=["RS256"],
|
||||
audience=self.client_id,
|
||||
issuer=f"https://login.microsoftonline.com/{self.tenant_id}/v2.0"
|
||||
)
|
||||
|
||||
logger.info(f"Token validation successful: {payload}")
|
||||
return payload
|
||||
|
||||
except JWTError as e:
|
||||
logger.error(f"JWT validation error: {e}")
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail=f"Invalid token: {str(e)}"
|
||||
)
|
||||
except Exception as e:
|
||||
logger.error(f"Token validation error: {e}")
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
|
||||
detail=f"Token validation failed: {str(e)}"
|
||||
)
|
||||
|
||||
def extract_user_info(self, token_payload: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""Extract user information from token payload"""
|
||||
return {
|
||||
"sso_user_id": token_payload.get("sub"),
|
||||
"sso_email": token_payload.get("email", token_payload.get("preferred_username")),
|
||||
"sso_name": token_payload.get("name"),
|
||||
"email": token_payload.get("email", token_payload.get("preferred_username")),
|
||||
"tenant_id": token_payload.get("tid"),
|
||||
"app_id": token_payload.get("appid"),
|
||||
}
|
||||
|
||||
async def get_or_create_user(self, token_payload: Dict[str, Any]) -> UserInDB:
|
||||
"""Get existing SSO user or create new one"""
|
||||
user_info = self.extract_user_info(token_payload)
|
||||
sso_user_id = user_info.get("sso_user_id")
|
||||
email = user_info.get("email")
|
||||
|
||||
if not sso_user_id or not email:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
detail="Token missing required user information"
|
||||
)
|
||||
|
||||
db = get_database()
|
||||
users_collection = db["users"]
|
||||
|
||||
# First try to find by SSO user ID
|
||||
user_doc = await users_collection.find_one({"sso_user_id": sso_user_id})
|
||||
|
||||
# If not found, try to find by email (for existing local users upgrading to SSO)
|
||||
if not user_doc:
|
||||
user_doc = await users_collection.find_one({"email": email})
|
||||
|
||||
if user_doc:
|
||||
# Update existing user with SSO info and last login
|
||||
update_data = {
|
||||
"auth_method": AuthMethod.SSO,
|
||||
"sso_provider": "azure",
|
||||
"sso_user_id": sso_user_id,
|
||||
"sso_email": user_info.get("sso_email"),
|
||||
"sso_name": user_info.get("sso_name"),
|
||||
"sso_attributes": token_payload,
|
||||
"last_sso_login": datetime.utcnow(),
|
||||
"updated_at": datetime.utcnow()
|
||||
}
|
||||
|
||||
await users_collection.update_one(
|
||||
{"_id": user_doc["_id"]},
|
||||
{"$set": update_data}
|
||||
)
|
||||
|
||||
# Fetch updated user
|
||||
user_doc = await users_collection.find_one({"_id": user_doc["_id"]})
|
||||
return UserInDB(**user_doc)
|
||||
else:
|
||||
# Create new SSO user with minimal access
|
||||
new_user = UserInDB(
|
||||
email=email,
|
||||
role=UserRole.USER, # New SSO users default to 'user' role
|
||||
is_active=True,
|
||||
auth_method=AuthMethod.SSO,
|
||||
sso_provider="azure",
|
||||
sso_user_id=sso_user_id,
|
||||
sso_email=user_info.get("sso_email"),
|
||||
sso_name=user_info.get("sso_name"),
|
||||
sso_attributes=token_payload,
|
||||
last_sso_login=datetime.utcnow(),
|
||||
index_access=[], # No index access initially
|
||||
created_at=datetime.utcnow(),
|
||||
updated_at=datetime.utcnow(),
|
||||
hashed_password=None # SSO users don't need passwords
|
||||
)
|
||||
|
||||
# Insert new user
|
||||
result = await users_collection.insert_one(new_user.model_dump(by_alias=True, exclude={"id"}))
|
||||
|
||||
# Fetch created user
|
||||
user_doc = await users_collection.find_one({"_id": result.inserted_id})
|
||||
logger.info(f"Created new SSO user: {email}")
|
||||
return UserInDB(**user_doc)
|
||||
|
||||
async def process_sso_login(self, access_token: str) -> UserInDB:
|
||||
"""Complete SSO login process"""
|
||||
# Validate the access token
|
||||
token_payload = await self.validate_token(access_token)
|
||||
|
||||
# Get or create user
|
||||
user = await self.get_or_create_user(token_payload)
|
||||
|
||||
# Check if user is active
|
||||
if not user.is_active:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="User account is disabled"
|
||||
)
|
||||
|
||||
return user
|
||||
|
||||
# Create singleton instance
|
||||
sso_service = SSOService()
|
||||
9
backend/app/utils/__init__.py
Normal file
9
backend/app/utils/__init__.py
Normal file
|
|
@ -0,0 +1,9 @@
|
|||
from .file_utils import validate_file, get_file_info, ensure_directory, clean_filename, get_upload_path
|
||||
|
||||
__all__ = [
|
||||
"validate_file",
|
||||
"get_file_info",
|
||||
"ensure_directory",
|
||||
"clean_filename",
|
||||
"get_upload_path"
|
||||
]
|
||||
87
backend/app/utils/file_utils.py
Normal file
87
backend/app/utils/file_utils.py
Normal file
|
|
@ -0,0 +1,87 @@
|
|||
import os
|
||||
import mimetypes
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, Optional
|
||||
from fastapi import UploadFile
|
||||
|
||||
def validate_file(file: UploadFile) -> Dict[str, Any]:
|
||||
"""Validate uploaded file and return file info"""
|
||||
if not file.filename:
|
||||
raise ValueError("No filename provided")
|
||||
|
||||
# Get file extension
|
||||
file_path = Path(file.filename)
|
||||
extension = file_path.suffix.lower()
|
||||
|
||||
# Get MIME type
|
||||
mime_type, _ = mimetypes.guess_type(file.filename)
|
||||
|
||||
# Validate MIME type
|
||||
allowed_mime_types = {
|
||||
'application/pdf',
|
||||
'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
|
||||
'application/msword',
|
||||
'text/plain',
|
||||
'text/csv',
|
||||
'application/json',
|
||||
'text/html',
|
||||
'text/markdown',
|
||||
'application/rtf'
|
||||
}
|
||||
|
||||
if mime_type not in allowed_mime_types:
|
||||
raise ValueError(f"MIME type {mime_type} not supported")
|
||||
|
||||
return {
|
||||
'filename': file.filename,
|
||||
'extension': extension,
|
||||
'mime_type': mime_type,
|
||||
'size': file.size
|
||||
}
|
||||
|
||||
def get_file_info(file_path: Path) -> Dict[str, Any]:
|
||||
"""Get information about a file"""
|
||||
if not file_path.exists():
|
||||
raise FileNotFoundError(f"File {file_path} not found")
|
||||
|
||||
stat = file_path.stat()
|
||||
mime_type, _ = mimetypes.guess_type(str(file_path))
|
||||
|
||||
return {
|
||||
'filename': file_path.name,
|
||||
'extension': file_path.suffix.lower(),
|
||||
'mime_type': mime_type,
|
||||
'size': stat.st_size,
|
||||
'created_at': stat.st_ctime,
|
||||
'modified_at': stat.st_mtime
|
||||
}
|
||||
|
||||
def ensure_directory(directory: Path) -> None:
|
||||
"""Ensure directory exists"""
|
||||
directory.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
def clean_filename(filename: str) -> str:
|
||||
"""Clean filename to be filesystem-safe"""
|
||||
# Remove or replace problematic characters
|
||||
invalid_chars = '<>:"/\\|?*'
|
||||
cleaned = filename
|
||||
|
||||
for char in invalid_chars:
|
||||
cleaned = cleaned.replace(char, '_')
|
||||
|
||||
# Remove leading/trailing dots and spaces
|
||||
cleaned = cleaned.strip('. ')
|
||||
|
||||
# Ensure it's not empty
|
||||
if not cleaned:
|
||||
cleaned = "unnamed_file"
|
||||
|
||||
return cleaned
|
||||
|
||||
def get_upload_path(index_id: str, filename: str, base_dir: str) -> Path:
|
||||
"""Generate upload path for a file"""
|
||||
base_path = Path(base_dir)
|
||||
index_path = base_path / index_id
|
||||
ensure_directory(index_path)
|
||||
|
||||
return index_path / clean_filename(filename)
|
||||
40
backend/docker-compose.yml
Normal file
40
backend/docker-compose.yml
Normal file
|
|
@ -0,0 +1,40 @@
|
|||
services:
|
||||
app:
|
||||
build: .
|
||||
ports:
|
||||
- "8001:8000"
|
||||
environment:
|
||||
- MONGODB_URL=mongodb://mongo:27017
|
||||
- REDIS_URL=redis://redis:6379
|
||||
- DATABASE_NAME=contract_analysis
|
||||
depends_on:
|
||||
- mongo
|
||||
- redis
|
||||
volumes:
|
||||
- ./uploads:/app/uploads
|
||||
- ./indices:/app/indices
|
||||
- ./.env:/app/.env
|
||||
command: uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
|
||||
|
||||
mongo:
|
||||
image: mongo:7
|
||||
ports:
|
||||
- "27017:27017"
|
||||
volumes:
|
||||
- mongo_data:/data/db
|
||||
environment:
|
||||
MONGO_INITDB_DATABASE: contract_analysis
|
||||
MONGO_INITDB_ROOT_USERNAME: netflix
|
||||
MONGO_INITDB_ROOT_PASSWORD: netflix
|
||||
command: mongod --auth
|
||||
|
||||
redis:
|
||||
image: redis:7-alpine
|
||||
ports:
|
||||
- "6379:6379"
|
||||
volumes:
|
||||
- redis_data:/data
|
||||
|
||||
volumes:
|
||||
mongo_data:
|
||||
redis_data:
|
||||
77
backend/test_chat_fixes.py
Normal file
77
backend/test_chat_fixes.py
Normal file
|
|
@ -0,0 +1,77 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
Test script to verify chat fixes are working correctly.
|
||||
Run this after starting the backend server.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import sys
|
||||
import os
|
||||
|
||||
# Add the app directory to Python path
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'app'))
|
||||
|
||||
from motor.motor_asyncio import AsyncIOMotorClient
|
||||
from datetime import datetime
|
||||
from services.llama_processor import llama_processor
|
||||
from services.chat_context_service import chat_context_service
|
||||
|
||||
async def test_chat_fixes():
|
||||
print("🧪 Testing Chat Fixes...")
|
||||
|
||||
# Test 1: Check if LlamaProcessor methods work
|
||||
print("\n1. Testing LlamaProcessor collection methods...")
|
||||
|
||||
test_index_id = "test-index-123"
|
||||
|
||||
# Test collection existence check
|
||||
exists = llama_processor.check_collection_exists(test_index_id)
|
||||
print(f" Collection exists: {exists}")
|
||||
|
||||
# Test collection info
|
||||
info = llama_processor.get_collection_info(test_index_id)
|
||||
print(f" Collection info: {info}")
|
||||
|
||||
# Test 2: Check MongoDB connection
|
||||
print("\n2. Testing MongoDB connection...")
|
||||
try:
|
||||
client = AsyncIOMotorClient("mongodb://localhost:27017")
|
||||
db = client.contract_analysis
|
||||
|
||||
# Test document count
|
||||
doc_count = await db.documents.count_documents({})
|
||||
print(f" Total documents in DB: {doc_count}")
|
||||
|
||||
# Test indices count
|
||||
indices_count = await db.indices.count_documents({})
|
||||
print(f" Total indices in DB: {indices_count}")
|
||||
|
||||
await client.close()
|
||||
except Exception as e:
|
||||
print(f" MongoDB connection failed: {e}")
|
||||
|
||||
# Test 3: Test timestamp generation
|
||||
print("\n3. Testing timestamp generation...")
|
||||
now = datetime.utcnow()
|
||||
print(f" Current UTC timestamp: {now}")
|
||||
print(f" Formatted: {now.strftime('%Y-%m-%d %H:%M:%S')}")
|
||||
|
||||
# Test 4: Test context service
|
||||
print("\n4. Testing context service...")
|
||||
try:
|
||||
# Test context formatting
|
||||
test_messages = [
|
||||
{"query": "What is this document about?", "response": "This document is about contracts.", "created_at": now},
|
||||
{"query": "Tell me more", "response": "It contains legal terms and conditions.", "created_at": now}
|
||||
]
|
||||
|
||||
formatted_context = chat_context_service.format_context_for_ai(test_messages)
|
||||
print(f" Context formatted successfully: {len(formatted_context)} characters")
|
||||
|
||||
except Exception as e:
|
||||
print(f" Context service test failed: {e}")
|
||||
|
||||
print("\n✅ Chat fixes test completed!")
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(test_chat_fixes())
|
||||
BIN
backend/uploads/1/be21c186-acad-4be5-81f4-a4518d8c05d3.pdf
Normal file
BIN
backend/uploads/1/be21c186-acad-4be5-81f4-a4518d8c05d3.pdf
Normal file
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
31
contract-query.service
Normal file
31
contract-query.service
Normal file
|
|
@ -0,0 +1,31 @@
|
|||
[Unit]
|
||||
Description=Contract Query Backend API
|
||||
After=network.target
|
||||
Wants=network.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=www-data
|
||||
Group=www-data
|
||||
WorkingDirectory=/var/www/html/contract-query/backend
|
||||
Environment=PATH=/var/www/html/contract-query/backend/venv/bin
|
||||
ExecStart=/var/www/html/contract-query/backend/venv/bin/uvicorn app.main:app --host 0.0.0.0 --port 8001
|
||||
ExecReload=/bin/kill -HUP $MAINPID
|
||||
KillMode=mixed
|
||||
Restart=always
|
||||
RestartSec=5
|
||||
StandardOutput=journal
|
||||
StandardError=journal
|
||||
|
||||
# Security settings
|
||||
NoNewPrivileges=true
|
||||
PrivateTmp=true
|
||||
ProtectSystem=false
|
||||
ProtectHome=false
|
||||
ReadWritePaths=/var/www/html/contract-query/backend
|
||||
|
||||
# Environment variables
|
||||
EnvironmentFile=/var/www/html/contract-query/backend/.env
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
BIN
contracts/Acrisure Oliver SOW 2023_2.23.23 - Fully Executed.pdf
Normal file
BIN
contracts/Acrisure Oliver SOW 2023_2.23.23 - Fully Executed.pdf
Normal file
Binary file not shown.
BIN
contracts/Updated_Acrisure Oliver SOW 2024_Client Signed.pdf
Normal file
BIN
contracts/Updated_Acrisure Oliver SOW 2024_Client Signed.pdf
Normal file
Binary file not shown.
1183
contracts_documentation.md
Normal file
1183
contracts_documentation.md
Normal file
File diff suppressed because it is too large
Load diff
2
frontend/.env.example
Normal file
2
frontend/.env.example
Normal file
|
|
@ -0,0 +1,2 @@
|
|||
VITE_API_URL=http://localhost:8000
|
||||
VITE_APP_NAME=Contract Analysis Tool
|
||||
20
frontend/.eslintrc.js
Normal file
20
frontend/.eslintrc.js
Normal file
|
|
@ -0,0 +1,20 @@
|
|||
module.exports = {
|
||||
root: true,
|
||||
env: { browser: true, es2020: true },
|
||||
extends: [
|
||||
'eslint:recommended',
|
||||
'plugin:react/recommended',
|
||||
'plugin:react/jsx-runtime',
|
||||
'plugin:react-hooks/recommended',
|
||||
],
|
||||
ignorePatterns: ['dist', '.eslintrc.js'],
|
||||
parserOptions: { ecmaVersion: 'latest', sourceType: 'module' },
|
||||
settings: { react: { version: '18.2' } },
|
||||
plugins: ['react-refresh'],
|
||||
rules: {
|
||||
'react-refresh/only-export-components': [
|
||||
'warn',
|
||||
{ allowConstantExport: true },
|
||||
],
|
||||
},
|
||||
}
|
||||
16
frontend/index.html
Normal file
16
frontend/index.html
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8" />
|
||||
<link rel="icon" type="image/svg+xml" href="/vite.svg" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<link rel="preconnect" href="https://fonts.googleapis.com">
|
||||
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
|
||||
<link href="https://fonts.googleapis.com/css2?family=Montserrat:ital,wght@0,100..900;1,100..900&display=swap" rel="stylesheet">
|
||||
<title>Contract Analysis Tool</title>
|
||||
</head>
|
||||
<body>
|
||||
<div id="root"></div>
|
||||
<script type="module" src="/src/main.jsx"></script>
|
||||
</body>
|
||||
</html>
|
||||
1
frontend/node_modules/.bin/acorn
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/acorn
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../acorn/bin/acorn
|
||||
1
frontend/node_modules/.bin/autoprefixer
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/autoprefixer
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../autoprefixer/bin/autoprefixer
|
||||
1
frontend/node_modules/.bin/browserslist
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/browserslist
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../browserslist/cli.js
|
||||
1
frontend/node_modules/.bin/cssesc
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/cssesc
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../cssesc/bin/cssesc
|
||||
1
frontend/node_modules/.bin/esbuild
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/esbuild
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../esbuild/bin/esbuild
|
||||
1
frontend/node_modules/.bin/eslint
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/eslint
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../eslint/bin/eslint.js
|
||||
1
frontend/node_modules/.bin/jiti
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/jiti
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../jiti/bin/jiti.js
|
||||
1
frontend/node_modules/.bin/js-yaml
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/js-yaml
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../js-yaml/bin/js-yaml.js
|
||||
1
frontend/node_modules/.bin/jsesc
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/jsesc
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../jsesc/bin/jsesc
|
||||
1
frontend/node_modules/.bin/json5
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/json5
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../json5/lib/cli.js
|
||||
1
frontend/node_modules/.bin/loose-envify
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/loose-envify
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../loose-envify/cli.js
|
||||
1
frontend/node_modules/.bin/nanoid
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/nanoid
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../nanoid/bin/nanoid.cjs
|
||||
1
frontend/node_modules/.bin/node-which
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/node-which
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../which/bin/node-which
|
||||
1
frontend/node_modules/.bin/parser
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/parser
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../@babel/parser/bin/babel-parser.js
|
||||
1
frontend/node_modules/.bin/resolve
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/resolve
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../resolve/bin/resolve
|
||||
1
frontend/node_modules/.bin/rimraf
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/rimraf
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../rimraf/bin.js
|
||||
1
frontend/node_modules/.bin/rollup
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/rollup
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../rollup/dist/bin/rollup
|
||||
1
frontend/node_modules/.bin/semver
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/semver
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../semver/bin/semver.js
|
||||
1
frontend/node_modules/.bin/showdown
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/showdown
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../showdown/bin/showdown.js
|
||||
1
frontend/node_modules/.bin/sucrase
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/sucrase
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../sucrase/bin/sucrase
|
||||
1
frontend/node_modules/.bin/sucrase-node
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/sucrase-node
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../sucrase/bin/sucrase-node
|
||||
1
frontend/node_modules/.bin/tailwind
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/tailwind
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../tailwindcss/lib/cli.js
|
||||
1
frontend/node_modules/.bin/tailwindcss
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/tailwindcss
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../tailwindcss/lib/cli.js
|
||||
1
frontend/node_modules/.bin/update-browserslist-db
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/update-browserslist-db
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../update-browserslist-db/cli.js
|
||||
1
frontend/node_modules/.bin/vite
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/vite
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../vite/bin/vite.js
|
||||
1
frontend/node_modules/.bin/yaml
generated
vendored
Symbolic link
1
frontend/node_modules/.bin/yaml
generated
vendored
Symbolic link
|
|
@ -0,0 +1 @@
|
|||
../yaml/bin.mjs
|
||||
6340
frontend/node_modules/.package-lock.json
generated
vendored
Normal file
6340
frontend/node_modules/.package-lock.json
generated
vendored
Normal file
File diff suppressed because it is too large
Load diff
126
frontend/node_modules/.vite/deps/@azure_msal-browser.js
generated
vendored
Normal file
126
frontend/node_modules/.vite/deps/@azure_msal-browser.js
generated
vendored
Normal file
|
|
@ -0,0 +1,126 @@
|
|||
import {
|
||||
AccountEntity,
|
||||
ApiId,
|
||||
AuthError,
|
||||
AuthErrorCodes_exports,
|
||||
AuthErrorMessage,
|
||||
AuthenticationHeaderParser,
|
||||
AuthenticationScheme,
|
||||
AzureCloudInstance,
|
||||
BrowserAuthError,
|
||||
BrowserAuthErrorCodes_exports,
|
||||
BrowserAuthErrorMessage,
|
||||
BrowserCacheLocation,
|
||||
BrowserConfigurationAuthError,
|
||||
BrowserConfigurationAuthErrorCodes_exports,
|
||||
BrowserConfigurationAuthErrorMessage,
|
||||
BrowserPerformanceClient,
|
||||
BrowserUtils_exports,
|
||||
CacheLookupPolicy,
|
||||
ClientAuthError,
|
||||
ClientAuthErrorCodes_exports,
|
||||
ClientAuthErrorMessage,
|
||||
ClientConfigurationError,
|
||||
ClientConfigurationErrorCodes_exports,
|
||||
ClientConfigurationErrorMessage,
|
||||
DEFAULT_IFRAME_TIMEOUT_MS,
|
||||
EventHandler,
|
||||
EventMessageUtils,
|
||||
EventType,
|
||||
InteractionRequiredAuthError,
|
||||
InteractionRequiredAuthErrorCodes_exports,
|
||||
InteractionRequiredAuthErrorMessage,
|
||||
InteractionStatus,
|
||||
InteractionType,
|
||||
JsonWebTokenTypes,
|
||||
LocalStorage,
|
||||
LogLevel,
|
||||
Logger,
|
||||
MemoryStorage,
|
||||
NavigationClient,
|
||||
OIDC_DEFAULT_SCOPES,
|
||||
PerformanceEvents,
|
||||
PromptValue,
|
||||
ProtocolMode,
|
||||
PublicClientApplication,
|
||||
PublicClientNext,
|
||||
ServerError,
|
||||
ServerResponseType,
|
||||
SessionStorage,
|
||||
SignedHttpRequest,
|
||||
StringUtils,
|
||||
StubPerformanceClient,
|
||||
UrlString,
|
||||
WrapperSKU,
|
||||
createNestablePublicClientApplication,
|
||||
createStandardPublicClientApplication,
|
||||
isPlatformBrokerAvailable,
|
||||
stubbedPublicClientApplication,
|
||||
version
|
||||
} from "./chunk-DRYC2OA6.js";
|
||||
import {
|
||||
BrowserPerformanceMeasurement
|
||||
} from "./chunk-U543NHT4.js";
|
||||
import "./chunk-G3PMV62Z.js";
|
||||
export {
|
||||
AccountEntity,
|
||||
ApiId,
|
||||
AuthError,
|
||||
AuthErrorCodes_exports as AuthErrorCodes,
|
||||
AuthErrorMessage,
|
||||
AuthenticationHeaderParser,
|
||||
AuthenticationScheme,
|
||||
AzureCloudInstance,
|
||||
BrowserAuthError,
|
||||
BrowserAuthErrorCodes_exports as BrowserAuthErrorCodes,
|
||||
BrowserAuthErrorMessage,
|
||||
BrowserCacheLocation,
|
||||
BrowserConfigurationAuthError,
|
||||
BrowserConfigurationAuthErrorCodes_exports as BrowserConfigurationAuthErrorCodes,
|
||||
BrowserConfigurationAuthErrorMessage,
|
||||
BrowserPerformanceClient,
|
||||
BrowserPerformanceMeasurement,
|
||||
BrowserUtils_exports as BrowserUtils,
|
||||
CacheLookupPolicy,
|
||||
ClientAuthError,
|
||||
ClientAuthErrorCodes_exports as ClientAuthErrorCodes,
|
||||
ClientAuthErrorMessage,
|
||||
ClientConfigurationError,
|
||||
ClientConfigurationErrorCodes_exports as ClientConfigurationErrorCodes,
|
||||
ClientConfigurationErrorMessage,
|
||||
DEFAULT_IFRAME_TIMEOUT_MS,
|
||||
EventHandler,
|
||||
EventMessageUtils,
|
||||
EventType,
|
||||
InteractionRequiredAuthError,
|
||||
InteractionRequiredAuthErrorCodes_exports as InteractionRequiredAuthErrorCodes,
|
||||
InteractionRequiredAuthErrorMessage,
|
||||
InteractionStatus,
|
||||
InteractionType,
|
||||
JsonWebTokenTypes,
|
||||
LocalStorage,
|
||||
LogLevel,
|
||||
Logger,
|
||||
MemoryStorage,
|
||||
NavigationClient,
|
||||
OIDC_DEFAULT_SCOPES,
|
||||
PerformanceEvents,
|
||||
PromptValue,
|
||||
ProtocolMode,
|
||||
PublicClientApplication,
|
||||
PublicClientNext,
|
||||
ServerError,
|
||||
ServerResponseType,
|
||||
SessionStorage,
|
||||
SignedHttpRequest,
|
||||
StringUtils,
|
||||
StubPerformanceClient,
|
||||
UrlString,
|
||||
WrapperSKU,
|
||||
createNestablePublicClientApplication,
|
||||
createStandardPublicClientApplication,
|
||||
isPlatformBrokerAvailable,
|
||||
stubbedPublicClientApplication,
|
||||
version
|
||||
};
|
||||
//# sourceMappingURL=@azure_msal-browser.js.map
|
||||
7
frontend/node_modules/.vite/deps/@azure_msal-browser.js.map
generated
vendored
Normal file
7
frontend/node_modules/.vite/deps/@azure_msal-browser.js.map
generated
vendored
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
{
|
||||
"version": 3,
|
||||
"sources": [],
|
||||
"sourcesContent": [],
|
||||
"mappings": "",
|
||||
"names": []
|
||||
}
|
||||
561
frontend/node_modules/.vite/deps/@azure_msal-react.js
generated
vendored
Normal file
561
frontend/node_modules/.vite/deps/@azure_msal-react.js
generated
vendored
Normal file
|
|
@ -0,0 +1,561 @@
|
|||
import {
|
||||
AccountEntity,
|
||||
AuthError,
|
||||
EventMessageUtils,
|
||||
EventType,
|
||||
InteractionRequiredAuthError,
|
||||
InteractionStatus,
|
||||
InteractionType,
|
||||
Logger,
|
||||
OIDC_DEFAULT_SCOPES,
|
||||
WrapperSKU,
|
||||
stubbedPublicClientApplication
|
||||
} from "./chunk-DRYC2OA6.js";
|
||||
import "./chunk-U543NHT4.js";
|
||||
import {
|
||||
require_react
|
||||
} from "./chunk-DRWLMN53.js";
|
||||
import {
|
||||
__toESM
|
||||
} from "./chunk-G3PMV62Z.js";
|
||||
|
||||
// node_modules/@azure/msal-react/dist/MsalContext.js
|
||||
var React = __toESM(require_react(), 1);
|
||||
var defaultMsalContext = {
|
||||
instance: stubbedPublicClientApplication,
|
||||
inProgress: InteractionStatus.None,
|
||||
accounts: [],
|
||||
logger: new Logger({})
|
||||
};
|
||||
var MsalContext = React.createContext(defaultMsalContext);
|
||||
var MsalConsumer = MsalContext.Consumer;
|
||||
|
||||
// node_modules/@azure/msal-react/dist/MsalProvider.js
|
||||
var import_react = __toESM(require_react(), 1);
|
||||
|
||||
// node_modules/@azure/msal-react/dist/utils/utilities.js
|
||||
function getChildrenOrFunction(children, args) {
|
||||
if (typeof children === "function") {
|
||||
return children(args);
|
||||
}
|
||||
return children;
|
||||
}
|
||||
function accountArraysAreEqual(arrayA, arrayB) {
|
||||
if (arrayA.length !== arrayB.length) {
|
||||
return false;
|
||||
}
|
||||
const comparisonArray = [...arrayB];
|
||||
return arrayA.every((elementA) => {
|
||||
const elementB = comparisonArray.shift();
|
||||
if (!elementA || !elementB) {
|
||||
return false;
|
||||
}
|
||||
return elementA.homeAccountId === elementB.homeAccountId && elementA.localAccountId === elementB.localAccountId && elementA.username === elementB.username;
|
||||
});
|
||||
}
|
||||
function getAccountByIdentifiers(allAccounts, accountIdentifiers) {
|
||||
if (allAccounts.length > 0 && (accountIdentifiers.homeAccountId || accountIdentifiers.localAccountId || accountIdentifiers.username)) {
|
||||
const matchedAccounts = allAccounts.filter((accountObj) => {
|
||||
if (accountIdentifiers.username && accountIdentifiers.username.toLowerCase() !== accountObj.username.toLowerCase()) {
|
||||
return false;
|
||||
}
|
||||
if (accountIdentifiers.homeAccountId && accountIdentifiers.homeAccountId.toLowerCase() !== accountObj.homeAccountId.toLowerCase()) {
|
||||
return false;
|
||||
}
|
||||
if (accountIdentifiers.localAccountId && accountIdentifiers.localAccountId.toLowerCase() !== accountObj.localAccountId.toLowerCase()) {
|
||||
return false;
|
||||
}
|
||||
return true;
|
||||
});
|
||||
return matchedAccounts[0] || null;
|
||||
} else {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
// node_modules/@azure/msal-react/dist/packageMetadata.js
|
||||
var name = "@azure/msal-react";
|
||||
var version = "3.0.15";
|
||||
|
||||
// node_modules/@azure/msal-react/dist/MsalProvider.js
|
||||
var MsalProviderActionType = {
|
||||
UNBLOCK_INPROGRESS: "UNBLOCK_INPROGRESS",
|
||||
EVENT: "EVENT"
|
||||
};
|
||||
var reducer = (previousState, action) => {
|
||||
const { type, payload } = action;
|
||||
let newInProgress = previousState.inProgress;
|
||||
switch (type) {
|
||||
case MsalProviderActionType.UNBLOCK_INPROGRESS:
|
||||
if (previousState.inProgress === InteractionStatus.Startup) {
|
||||
newInProgress = InteractionStatus.None;
|
||||
payload.logger.info("MsalProvider - handleRedirectPromise resolved, setting inProgress to 'none'");
|
||||
}
|
||||
break;
|
||||
case MsalProviderActionType.EVENT:
|
||||
const message = payload.message;
|
||||
const status = EventMessageUtils.getInteractionStatusFromEvent(message, previousState.inProgress);
|
||||
if (status) {
|
||||
payload.logger.info(`MsalProvider - ${message.eventType} results in setting inProgress from ${previousState.inProgress} to ${status}`);
|
||||
newInProgress = status;
|
||||
}
|
||||
break;
|
||||
default:
|
||||
throw new Error(`Unknown action type: ${type}`);
|
||||
}
|
||||
if (newInProgress === InteractionStatus.Startup) {
|
||||
return previousState;
|
||||
}
|
||||
const currentAccounts = payload.instance.getAllAccounts();
|
||||
if (newInProgress !== previousState.inProgress && !accountArraysAreEqual(currentAccounts, previousState.accounts)) {
|
||||
return {
|
||||
...previousState,
|
||||
inProgress: newInProgress,
|
||||
accounts: currentAccounts
|
||||
};
|
||||
} else if (newInProgress !== previousState.inProgress) {
|
||||
return {
|
||||
...previousState,
|
||||
inProgress: newInProgress
|
||||
};
|
||||
} else if (!accountArraysAreEqual(currentAccounts, previousState.accounts)) {
|
||||
return {
|
||||
...previousState,
|
||||
accounts: currentAccounts
|
||||
};
|
||||
} else {
|
||||
return previousState;
|
||||
}
|
||||
};
|
||||
function MsalProvider({ instance, children }) {
|
||||
(0, import_react.useEffect)(() => {
|
||||
instance.initializeWrapperLibrary(WrapperSKU.React, version);
|
||||
}, [instance]);
|
||||
const logger = (0, import_react.useMemo)(() => {
|
||||
return instance.getLogger().clone(name, version);
|
||||
}, [instance]);
|
||||
const [state, updateState] = (0, import_react.useReducer)(reducer, void 0, () => {
|
||||
return {
|
||||
inProgress: InteractionStatus.Startup,
|
||||
accounts: []
|
||||
};
|
||||
});
|
||||
(0, import_react.useEffect)(() => {
|
||||
const callbackId = instance.addEventCallback((message) => {
|
||||
updateState({
|
||||
payload: {
|
||||
instance,
|
||||
logger,
|
||||
message
|
||||
},
|
||||
type: MsalProviderActionType.EVENT
|
||||
});
|
||||
});
|
||||
logger.verbose(`MsalProvider - Registered event callback with id: ${callbackId}`);
|
||||
instance.initialize().then(() => {
|
||||
instance.handleRedirectPromise().catch(() => {
|
||||
return;
|
||||
}).finally(() => {
|
||||
updateState({
|
||||
payload: {
|
||||
instance,
|
||||
logger
|
||||
},
|
||||
type: MsalProviderActionType.UNBLOCK_INPROGRESS
|
||||
});
|
||||
});
|
||||
}).catch(() => {
|
||||
return;
|
||||
});
|
||||
return () => {
|
||||
if (callbackId) {
|
||||
logger.verbose(`MsalProvider - Removing event callback ${callbackId}`);
|
||||
instance.removeEventCallback(callbackId);
|
||||
}
|
||||
};
|
||||
}, [instance, logger]);
|
||||
const contextValue = {
|
||||
instance,
|
||||
inProgress: state.inProgress,
|
||||
accounts: state.accounts,
|
||||
logger
|
||||
};
|
||||
return import_react.default.createElement(MsalContext.Provider, { value: contextValue }, children);
|
||||
}
|
||||
|
||||
// node_modules/@azure/msal-react/dist/components/AuthenticatedTemplate.js
|
||||
var import_react4 = __toESM(require_react(), 1);
|
||||
|
||||
// node_modules/@azure/msal-react/dist/hooks/useMsal.js
|
||||
var import_react2 = __toESM(require_react(), 1);
|
||||
var useMsal = () => (0, import_react2.useContext)(MsalContext);
|
||||
|
||||
// node_modules/@azure/msal-react/dist/hooks/useIsAuthenticated.js
|
||||
var import_react3 = __toESM(require_react(), 1);
|
||||
function isAuthenticated(allAccounts, matchAccount) {
|
||||
if (matchAccount && (matchAccount.username || matchAccount.homeAccountId || matchAccount.localAccountId)) {
|
||||
return !!getAccountByIdentifiers(allAccounts, matchAccount);
|
||||
}
|
||||
return allAccounts.length > 0;
|
||||
}
|
||||
function useIsAuthenticated(matchAccount) {
|
||||
const { accounts: allAccounts, inProgress } = useMsal();
|
||||
const isUserAuthenticated = (0, import_react3.useMemo)(() => {
|
||||
if (inProgress === InteractionStatus.Startup) {
|
||||
return false;
|
||||
}
|
||||
return isAuthenticated(allAccounts, matchAccount);
|
||||
}, [allAccounts, inProgress, matchAccount]);
|
||||
return isUserAuthenticated;
|
||||
}
|
||||
|
||||
// node_modules/@azure/msal-react/dist/components/AuthenticatedTemplate.js
|
||||
function AuthenticatedTemplate({ username, homeAccountId, localAccountId, children }) {
|
||||
const context = useMsal();
|
||||
const accountIdentifier = (0, import_react4.useMemo)(() => {
|
||||
return {
|
||||
username,
|
||||
homeAccountId,
|
||||
localAccountId
|
||||
};
|
||||
}, [username, homeAccountId, localAccountId]);
|
||||
const isAuthenticated2 = useIsAuthenticated(accountIdentifier);
|
||||
if (isAuthenticated2 && context.inProgress !== InteractionStatus.Startup) {
|
||||
return import_react4.default.createElement(import_react4.default.Fragment, null, getChildrenOrFunction(children, context));
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
// node_modules/@azure/msal-react/dist/components/UnauthenticatedTemplate.js
|
||||
var import_react5 = __toESM(require_react(), 1);
|
||||
function UnauthenticatedTemplate({ username, homeAccountId, localAccountId, children }) {
|
||||
const context = useMsal();
|
||||
const accountIdentifier = (0, import_react5.useMemo)(() => {
|
||||
return {
|
||||
username,
|
||||
homeAccountId,
|
||||
localAccountId
|
||||
};
|
||||
}, [username, homeAccountId, localAccountId]);
|
||||
const isAuthenticated2 = useIsAuthenticated(accountIdentifier);
|
||||
if (!isAuthenticated2 && context.inProgress !== InteractionStatus.Startup && context.inProgress !== InteractionStatus.HandleRedirect) {
|
||||
return import_react5.default.createElement(import_react5.default.Fragment, null, getChildrenOrFunction(children, context));
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
// node_modules/@azure/msal-react/dist/components/MsalAuthenticationTemplate.js
|
||||
var import_react8 = __toESM(require_react(), 1);
|
||||
|
||||
// node_modules/@azure/msal-react/dist/hooks/useMsalAuthentication.js
|
||||
var import_react7 = __toESM(require_react(), 1);
|
||||
|
||||
// node_modules/@azure/msal-react/dist/hooks/useAccount.js
|
||||
var import_react6 = __toESM(require_react(), 1);
|
||||
function getAccount(instance, accountIdentifiers) {
|
||||
if (!accountIdentifiers || !accountIdentifiers.homeAccountId && !accountIdentifiers.localAccountId && !accountIdentifiers.username) {
|
||||
return instance.getActiveAccount();
|
||||
}
|
||||
return getAccountByIdentifiers(instance.getAllAccounts(), accountIdentifiers);
|
||||
}
|
||||
function useAccount(accountIdentifiers) {
|
||||
const { instance, inProgress, logger } = useMsal();
|
||||
const [account, setAccount] = (0, import_react6.useState)(() => {
|
||||
if (inProgress === InteractionStatus.Startup) {
|
||||
return null;
|
||||
} else {
|
||||
return getAccount(instance, accountIdentifiers);
|
||||
}
|
||||
});
|
||||
(0, import_react6.useEffect)(() => {
|
||||
if (inProgress !== InteractionStatus.Startup) {
|
||||
setAccount((currentAccount) => {
|
||||
const nextAccount = getAccount(instance, accountIdentifiers);
|
||||
if (!AccountEntity.accountInfoIsEqual(currentAccount, nextAccount, true)) {
|
||||
logger.info("useAccount - Updating account");
|
||||
return nextAccount;
|
||||
}
|
||||
return currentAccount;
|
||||
});
|
||||
}
|
||||
}, [inProgress, accountIdentifiers, instance, logger]);
|
||||
return account;
|
||||
}
|
||||
|
||||
// node_modules/@azure/msal-react/dist/error/ReactAuthError.js
|
||||
var ReactAuthErrorMessage = {
|
||||
invalidInteractionType: {
|
||||
code: "invalid_interaction_type",
|
||||
desc: "The provided interaction type is invalid."
|
||||
},
|
||||
unableToFallbackToInteraction: {
|
||||
code: "unable_to_fallback_to_interaction",
|
||||
desc: "Interaction is required but another interaction is already in progress. Please try again when the current interaction is complete."
|
||||
}
|
||||
};
|
||||
var ReactAuthError = class _ReactAuthError extends AuthError {
|
||||
constructor(errorCode, errorMessage) {
|
||||
super(errorCode, errorMessage);
|
||||
Object.setPrototypeOf(this, _ReactAuthError.prototype);
|
||||
this.name = "ReactAuthError";
|
||||
}
|
||||
static createInvalidInteractionTypeError() {
|
||||
return new _ReactAuthError(ReactAuthErrorMessage.invalidInteractionType.code, ReactAuthErrorMessage.invalidInteractionType.desc);
|
||||
}
|
||||
static createUnableToFallbackToInteractionError() {
|
||||
return new _ReactAuthError(ReactAuthErrorMessage.unableToFallbackToInteraction.code, ReactAuthErrorMessage.unableToFallbackToInteraction.desc);
|
||||
}
|
||||
};
|
||||
|
||||
// node_modules/@azure/msal-react/dist/hooks/useMsalAuthentication.js
|
||||
function useMsalAuthentication(interactionType, authenticationRequest, accountIdentifiers) {
|
||||
const { instance, inProgress, logger } = useMsal();
|
||||
const isAuthenticated2 = useIsAuthenticated(accountIdentifiers);
|
||||
const account = useAccount(accountIdentifiers);
|
||||
const [[result, error], setResponse] = (0, import_react7.useState)([null, null]);
|
||||
const mounted = (0, import_react7.useRef)(true);
|
||||
(0, import_react7.useEffect)(() => {
|
||||
return () => {
|
||||
mounted.current = false;
|
||||
};
|
||||
}, []);
|
||||
const interactionInProgress = (0, import_react7.useRef)(inProgress !== InteractionStatus.None);
|
||||
(0, import_react7.useEffect)(() => {
|
||||
interactionInProgress.current = inProgress !== InteractionStatus.None;
|
||||
}, [inProgress]);
|
||||
const shouldAcquireToken = (0, import_react7.useRef)(true);
|
||||
(0, import_react7.useEffect)(() => {
|
||||
if (!!error) {
|
||||
shouldAcquireToken.current = false;
|
||||
return;
|
||||
}
|
||||
if (!!result) {
|
||||
shouldAcquireToken.current = false;
|
||||
return;
|
||||
}
|
||||
}, [error, result]);
|
||||
const login = (0, import_react7.useCallback)(async (callbackInteractionType, callbackRequest) => {
|
||||
const loginType = callbackInteractionType || interactionType;
|
||||
const loginRequest = callbackRequest || authenticationRequest;
|
||||
switch (loginType) {
|
||||
case InteractionType.Popup:
|
||||
logger.verbose("useMsalAuthentication - Calling loginPopup");
|
||||
return instance.loginPopup(loginRequest);
|
||||
case InteractionType.Redirect:
|
||||
logger.verbose("useMsalAuthentication - Calling loginRedirect");
|
||||
return instance.loginRedirect(loginRequest).then(null);
|
||||
case InteractionType.Silent:
|
||||
logger.verbose("useMsalAuthentication - Calling ssoSilent");
|
||||
return instance.ssoSilent(loginRequest);
|
||||
default:
|
||||
throw ReactAuthError.createInvalidInteractionTypeError();
|
||||
}
|
||||
}, [instance, interactionType, authenticationRequest, logger]);
|
||||
const acquireToken = (0, import_react7.useCallback)(async (callbackInteractionType, callbackRequest) => {
|
||||
const fallbackInteractionType = callbackInteractionType || interactionType;
|
||||
let tokenRequest;
|
||||
if (callbackRequest) {
|
||||
logger.trace("useMsalAuthentication - acquireToken - Using request provided in the callback");
|
||||
tokenRequest = {
|
||||
...callbackRequest
|
||||
};
|
||||
} else if (authenticationRequest) {
|
||||
logger.trace("useMsalAuthentication - acquireToken - Using request provided in the hook");
|
||||
tokenRequest = {
|
||||
...authenticationRequest,
|
||||
scopes: authenticationRequest.scopes || OIDC_DEFAULT_SCOPES
|
||||
};
|
||||
} else {
|
||||
logger.trace("useMsalAuthentication - acquireToken - No request object provided, using default request.");
|
||||
tokenRequest = {
|
||||
scopes: OIDC_DEFAULT_SCOPES
|
||||
};
|
||||
}
|
||||
if (!tokenRequest.account && account) {
|
||||
logger.trace("useMsalAuthentication - acquireToken - Attaching account to request");
|
||||
tokenRequest.account = account;
|
||||
}
|
||||
const getToken = async () => {
|
||||
logger.verbose("useMsalAuthentication - Calling acquireTokenSilent");
|
||||
return instance.acquireTokenSilent(tokenRequest).catch(async (e) => {
|
||||
if (e instanceof InteractionRequiredAuthError) {
|
||||
if (!interactionInProgress.current) {
|
||||
logger.error("useMsalAuthentication - Interaction required, falling back to interaction");
|
||||
return login(fallbackInteractionType, tokenRequest);
|
||||
} else {
|
||||
logger.error("useMsalAuthentication - Interaction required but is already in progress. Please try again, if needed, after interaction completes.");
|
||||
throw ReactAuthError.createUnableToFallbackToInteractionError();
|
||||
}
|
||||
}
|
||||
throw e;
|
||||
});
|
||||
};
|
||||
return getToken().then((response) => {
|
||||
if (mounted.current) {
|
||||
setResponse([response, null]);
|
||||
}
|
||||
return response;
|
||||
}).catch((e) => {
|
||||
if (mounted.current) {
|
||||
setResponse([null, e]);
|
||||
}
|
||||
throw e;
|
||||
});
|
||||
}, [
|
||||
instance,
|
||||
interactionType,
|
||||
authenticationRequest,
|
||||
logger,
|
||||
account,
|
||||
login
|
||||
]);
|
||||
(0, import_react7.useEffect)(() => {
|
||||
const callbackId = instance.addEventCallback((message) => {
|
||||
switch (message.eventType) {
|
||||
case EventType.LOGIN_SUCCESS:
|
||||
case EventType.SSO_SILENT_SUCCESS:
|
||||
if (message.payload) {
|
||||
setResponse([
|
||||
message.payload,
|
||||
null
|
||||
]);
|
||||
}
|
||||
break;
|
||||
case EventType.LOGIN_FAILURE:
|
||||
case EventType.SSO_SILENT_FAILURE:
|
||||
if (message.error) {
|
||||
setResponse([null, message.error]);
|
||||
}
|
||||
break;
|
||||
}
|
||||
});
|
||||
logger.verbose(`useMsalAuthentication - Registered event callback with id: ${callbackId}`);
|
||||
return () => {
|
||||
if (callbackId) {
|
||||
logger.verbose(`useMsalAuthentication - Removing event callback ${callbackId}`);
|
||||
instance.removeEventCallback(callbackId);
|
||||
}
|
||||
};
|
||||
}, [instance, logger]);
|
||||
(0, import_react7.useEffect)(() => {
|
||||
if (shouldAcquireToken.current && inProgress === InteractionStatus.None) {
|
||||
if (!isAuthenticated2) {
|
||||
shouldAcquireToken.current = false;
|
||||
logger.info("useMsalAuthentication - No user is authenticated, attempting to login");
|
||||
login().catch(() => {
|
||||
return;
|
||||
});
|
||||
} else if (account) {
|
||||
shouldAcquireToken.current = false;
|
||||
logger.info("useMsalAuthentication - User is authenticated, attempting to acquire token");
|
||||
acquireToken().catch(() => {
|
||||
return;
|
||||
});
|
||||
}
|
||||
}
|
||||
}, [isAuthenticated2, account, inProgress, login, acquireToken, logger]);
|
||||
return {
|
||||
login,
|
||||
acquireToken,
|
||||
result,
|
||||
error
|
||||
};
|
||||
}
|
||||
|
||||
// node_modules/@azure/msal-react/dist/components/MsalAuthenticationTemplate.js
|
||||
function MsalAuthenticationTemplate({ interactionType, username, homeAccountId, localAccountId, authenticationRequest, loadingComponent: LoadingComponent, errorComponent: ErrorComponent, children }) {
|
||||
const accountIdentifier = (0, import_react8.useMemo)(() => {
|
||||
return {
|
||||
username,
|
||||
homeAccountId,
|
||||
localAccountId
|
||||
};
|
||||
}, [username, homeAccountId, localAccountId]);
|
||||
const context = useMsal();
|
||||
const msalAuthResult = useMsalAuthentication(interactionType, authenticationRequest, accountIdentifier);
|
||||
const isAuthenticated2 = useIsAuthenticated(accountIdentifier);
|
||||
if (msalAuthResult.error && context.inProgress === InteractionStatus.None) {
|
||||
if (!!ErrorComponent) {
|
||||
return import_react8.default.createElement(ErrorComponent, { ...msalAuthResult });
|
||||
}
|
||||
throw msalAuthResult.error;
|
||||
}
|
||||
if (isAuthenticated2) {
|
||||
return import_react8.default.createElement(import_react8.default.Fragment, null, getChildrenOrFunction(children, msalAuthResult));
|
||||
}
|
||||
if (!!LoadingComponent && context.inProgress !== InteractionStatus.None) {
|
||||
return import_react8.default.createElement(LoadingComponent, { ...context });
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
// node_modules/@azure/msal-react/dist/components/withMsal.js
|
||||
var import_react9 = __toESM(require_react(), 1);
|
||||
var withMsal = (Component) => {
|
||||
const ComponentWithMsal = (props) => {
|
||||
const msal = useMsal();
|
||||
return import_react9.default.createElement(Component, { ...props, msalContext: msal });
|
||||
};
|
||||
const componentName = Component.displayName || Component.name || "Component";
|
||||
ComponentWithMsal.displayName = `withMsal(${componentName})`;
|
||||
return ComponentWithMsal;
|
||||
};
|
||||
export {
|
||||
AuthenticatedTemplate,
|
||||
MsalAuthenticationTemplate,
|
||||
MsalConsumer,
|
||||
MsalContext,
|
||||
MsalProvider,
|
||||
UnauthenticatedTemplate,
|
||||
useAccount,
|
||||
useIsAuthenticated,
|
||||
useMsal,
|
||||
useMsalAuthentication,
|
||||
version,
|
||||
withMsal
|
||||
};
|
||||
/*! Bundled license information:
|
||||
|
||||
@azure/msal-react/dist/MsalContext.js:
|
||||
(*! @azure/msal-react v3.0.15 2025-07-08 *)
|
||||
|
||||
@azure/msal-react/dist/utils/utilities.js:
|
||||
(*! @azure/msal-react v3.0.15 2025-07-08 *)
|
||||
|
||||
@azure/msal-react/dist/packageMetadata.js:
|
||||
(*! @azure/msal-react v3.0.15 2025-07-08 *)
|
||||
|
||||
@azure/msal-react/dist/MsalProvider.js:
|
||||
(*! @azure/msal-react v3.0.15 2025-07-08 *)
|
||||
|
||||
@azure/msal-react/dist/hooks/useMsal.js:
|
||||
(*! @azure/msal-react v3.0.15 2025-07-08 *)
|
||||
|
||||
@azure/msal-react/dist/hooks/useIsAuthenticated.js:
|
||||
(*! @azure/msal-react v3.0.15 2025-07-08 *)
|
||||
|
||||
@azure/msal-react/dist/components/AuthenticatedTemplate.js:
|
||||
(*! @azure/msal-react v3.0.15 2025-07-08 *)
|
||||
|
||||
@azure/msal-react/dist/components/UnauthenticatedTemplate.js:
|
||||
(*! @azure/msal-react v3.0.15 2025-07-08 *)
|
||||
|
||||
@azure/msal-react/dist/hooks/useAccount.js:
|
||||
(*! @azure/msal-react v3.0.15 2025-07-08 *)
|
||||
|
||||
@azure/msal-react/dist/error/ReactAuthError.js:
|
||||
(*! @azure/msal-react v3.0.15 2025-07-08 *)
|
||||
|
||||
@azure/msal-react/dist/hooks/useMsalAuthentication.js:
|
||||
(*! @azure/msal-react v3.0.15 2025-07-08 *)
|
||||
|
||||
@azure/msal-react/dist/components/MsalAuthenticationTemplate.js:
|
||||
(*! @azure/msal-react v3.0.15 2025-07-08 *)
|
||||
|
||||
@azure/msal-react/dist/components/withMsal.js:
|
||||
(*! @azure/msal-react v3.0.15 2025-07-08 *)
|
||||
|
||||
@azure/msal-react/dist/index.js:
|
||||
(*! @azure/msal-react v3.0.15 2025-07-08 *)
|
||||
*/
|
||||
//# sourceMappingURL=@azure_msal-react.js.map
|
||||
7
frontend/node_modules/.vite/deps/@azure_msal-react.js.map
generated
vendored
Normal file
7
frontend/node_modules/.vite/deps/@azure_msal-react.js.map
generated
vendored
Normal file
File diff suppressed because one or more lines are too long
8
frontend/node_modules/.vite/deps/BrowserPerformanceMeasurement-DAXTNWC7.js
generated
vendored
Normal file
8
frontend/node_modules/.vite/deps/BrowserPerformanceMeasurement-DAXTNWC7.js
generated
vendored
Normal file
|
|
@ -0,0 +1,8 @@
|
|||
import {
|
||||
BrowserPerformanceMeasurement
|
||||
} from "./chunk-U543NHT4.js";
|
||||
import "./chunk-G3PMV62Z.js";
|
||||
export {
|
||||
BrowserPerformanceMeasurement
|
||||
};
|
||||
//# sourceMappingURL=BrowserPerformanceMeasurement-DAXTNWC7.js.map
|
||||
7
frontend/node_modules/.vite/deps/BrowserPerformanceMeasurement-DAXTNWC7.js.map
generated
vendored
Normal file
7
frontend/node_modules/.vite/deps/BrowserPerformanceMeasurement-DAXTNWC7.js.map
generated
vendored
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
{
|
||||
"version": 3,
|
||||
"sources": [],
|
||||
"sourcesContent": [],
|
||||
"mappings": "",
|
||||
"names": []
|
||||
}
|
||||
118
frontend/node_modules/.vite/deps/_metadata.json
generated
vendored
Normal file
118
frontend/node_modules/.vite/deps/_metadata.json
generated
vendored
Normal file
|
|
@ -0,0 +1,118 @@
|
|||
{
|
||||
"hash": "42012998",
|
||||
"configHash": "2337c4bc",
|
||||
"lockfileHash": "2912a9cc",
|
||||
"browserHash": "4c751707",
|
||||
"optimized": {
|
||||
"react": {
|
||||
"src": "../../react/index.js",
|
||||
"file": "react.js",
|
||||
"fileHash": "f754a42b",
|
||||
"needsInterop": true
|
||||
},
|
||||
"react-dom": {
|
||||
"src": "../../react-dom/index.js",
|
||||
"file": "react-dom.js",
|
||||
"fileHash": "93d3be4a",
|
||||
"needsInterop": true
|
||||
},
|
||||
"react/jsx-dev-runtime": {
|
||||
"src": "../../react/jsx-dev-runtime.js",
|
||||
"file": "react_jsx-dev-runtime.js",
|
||||
"fileHash": "58e54cfa",
|
||||
"needsInterop": true
|
||||
},
|
||||
"react/jsx-runtime": {
|
||||
"src": "../../react/jsx-runtime.js",
|
||||
"file": "react_jsx-runtime.js",
|
||||
"fileHash": "452eb51a",
|
||||
"needsInterop": true
|
||||
},
|
||||
"@azure/msal-browser": {
|
||||
"src": "../../@azure/msal-browser/dist/index.mjs",
|
||||
"file": "@azure_msal-browser.js",
|
||||
"fileHash": "af1eb922",
|
||||
"needsInterop": false
|
||||
},
|
||||
"@azure/msal-react": {
|
||||
"src": "../../@azure/msal-react/dist/index.js",
|
||||
"file": "@azure_msal-react.js",
|
||||
"fileHash": "78fcd178",
|
||||
"needsInterop": false
|
||||
},
|
||||
"axios": {
|
||||
"src": "../../axios/index.js",
|
||||
"file": "axios.js",
|
||||
"fileHash": "a41d57c8",
|
||||
"needsInterop": false
|
||||
},
|
||||
"lucide-react": {
|
||||
"src": "../../lucide-react/dist/esm/lucide-react.js",
|
||||
"file": "lucide-react.js",
|
||||
"fileHash": "ab000d47",
|
||||
"needsInterop": false
|
||||
},
|
||||
"react-dom/client": {
|
||||
"src": "../../react-dom/client.js",
|
||||
"file": "react-dom_client.js",
|
||||
"fileHash": "977e8266",
|
||||
"needsInterop": true
|
||||
},
|
||||
"react-dropzone": {
|
||||
"src": "../../react-dropzone/dist/es/index.js",
|
||||
"file": "react-dropzone.js",
|
||||
"fileHash": "c6fc4427",
|
||||
"needsInterop": false
|
||||
},
|
||||
"react-hook-form": {
|
||||
"src": "../../react-hook-form/dist/index.esm.mjs",
|
||||
"file": "react-hook-form.js",
|
||||
"fileHash": "76d14c40",
|
||||
"needsInterop": false
|
||||
},
|
||||
"react-hot-toast": {
|
||||
"src": "../../react-hot-toast/dist/index.mjs",
|
||||
"file": "react-hot-toast.js",
|
||||
"fileHash": "4cb655f1",
|
||||
"needsInterop": false
|
||||
},
|
||||
"react-query": {
|
||||
"src": "../../react-query/es/index.js",
|
||||
"file": "react-query.js",
|
||||
"fileHash": "72a776d0",
|
||||
"needsInterop": false
|
||||
},
|
||||
"react-router-dom": {
|
||||
"src": "../../react-router-dom/dist/index.js",
|
||||
"file": "react-router-dom.js",
|
||||
"fileHash": "1b99dcc7",
|
||||
"needsInterop": false
|
||||
},
|
||||
"showdown": {
|
||||
"src": "../../showdown/dist/showdown.js",
|
||||
"file": "showdown.js",
|
||||
"fileHash": "d5940d71",
|
||||
"needsInterop": true
|
||||
}
|
||||
},
|
||||
"chunks": {
|
||||
"BrowserPerformanceMeasurement-DAXTNWC7": {
|
||||
"file": "BrowserPerformanceMeasurement-DAXTNWC7.js"
|
||||
},
|
||||
"chunk-PJEEZAML": {
|
||||
"file": "chunk-PJEEZAML.js"
|
||||
},
|
||||
"chunk-DRYC2OA6": {
|
||||
"file": "chunk-DRYC2OA6.js"
|
||||
},
|
||||
"chunk-U543NHT4": {
|
||||
"file": "chunk-U543NHT4.js"
|
||||
},
|
||||
"chunk-DRWLMN53": {
|
||||
"file": "chunk-DRWLMN53.js"
|
||||
},
|
||||
"chunk-G3PMV62Z": {
|
||||
"file": "chunk-G3PMV62Z.js"
|
||||
}
|
||||
}
|
||||
}
|
||||
2523
frontend/node_modules/.vite/deps/axios.js
generated
vendored
Normal file
2523
frontend/node_modules/.vite/deps/axios.js
generated
vendored
Normal file
File diff suppressed because it is too large
Load diff
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Reference in a new issue