contract-query/README.md
2025-08-14 15:15:47 -05:00

16 KiB

Contract Analysis Tool v2.0

A modern, production-ready Retrieval-Augmented Generation (RAG) application for intelligent contract analysis and document Q&A. Built with FastAPI backend and React frontend with advanced features including SSO integration, context-aware chat, and comprehensive document processing.

Architecture Frontend Database Cache Vector Store

🚀 Key Features

Core Functionality

  • Modern Architecture: FastAPI + React + MongoDB + Redis + ChromaDB
  • AI-Powered Analysis: OpenAI GPT-4 integration with contract summarization
  • Advanced RAG System: Context-aware document Q&A with source citations
  • Document Processing: Multi-format support (PDF, DOCX, DOC, TXT, CSV, JSON, HTML, MD, RTF)
  • Vector Search: ChromaDB for semantic similarity search

Authentication & Security

  • Dual Authentication: Local JWT + Azure AD/SSO integration
  • Role-Based Access Control: Admin/User permissions
  • JWT Token Management: Automatic refresh with 3-hour expiration
  • Secure File Upload: Validation, sanitization, and size limits

Advanced Chat System

  • Context-Aware Conversations: 24-hour rolling context window (max 10 messages)
  • Smart Caching: Context-dependent responses aren't cached, simple queries are
  • Real-time Statistics: Response times, cache hit rates, message counts
  • Proper Message Ordering: Chronological display with accurate timestamps
  • Source Citations: Direct references to document chunks in responses

Document Management

  • Batch Processing: Multiple document uploads with progress tracking
  • Index Organization: Create themed document collections
  • Processing Pipeline: PDF parsing → chunking → embedding → vector storage
  • Status Tracking: Real-time processing and embedding status
  • Contract Summaries: Automated contract analysis and key point extraction

Admin Features

  • System Statistics: Monitor usage, performance, and system health
  • User Management: Create, edit, and manage user accounts
  • Document Reprocessing: Retry failed documents
  • Index Management: Create and manage document indices
  • Advanced RAG Interface: Admin-specific query tools

🏗️ Architecture

React Frontend (Vite + Tailwind) → FastAPI Backend → MongoDB + ChromaDB → OpenAI API
                                        ↓
                                   Redis Cache
                                        ↓
                              Azure AD/SSO (Optional)

Data Flow:

  1. Documents uploaded through React frontend
  2. FastAPI processes with LlamaIndex (chunking, parsing)
  3. OpenAI embeddings stored in ChromaDB
  4. Metadata and user data in MongoDB
  5. RAG queries combine vector search + GPT-4 generation
  6. Redis caches responses for performance

📋 Prerequisites

  • Python 3.11+ (Backend)
  • Node.js 18+ (Frontend)
  • MongoDB 7+ (Document metadata)
  • Redis 7+ (Caching - optional)
  • OpenAI API Key (Required)
  • LlamaParse API Key (Optional - enhanced PDF processing)

🛠️ Quick Start

# Clone repository
git clone <repository-url>
cd llama-contracts-master

# Configure environment
cp backend/.env.example backend/.env
# Edit backend/.env with your API keys

# Start backend services
cd backend
docker-compose up -d

# Start frontend
cd ../frontend
npm install
npm run dev

Option 2: Manual Development Setup

Backend Setup

cd backend
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Edit .env file with your settings

# Start services (macOS with Homebrew)
brew services start mongodb-community
brew services start redis

# Initialize database and start server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Frontend Setup

cd frontend
npm install

# Configure environment  
cp .env.example .env
# Edit frontend/.env

# Start development server
npm run dev

⚙️ Configuration

Backend Environment Variables (.env)

# Database Configuration
MONGODB_URL=mongodb://localhost:27017
DATABASE_NAME=contract_analysis

# Redis Cache (Optional)
REDIS_URL=redis://localhost:6379

# Authentication
JWT_SECRET_KEY=your-super-secret-jwt-key-change-in-production
JWT_ALGORITHM=HS256
JWT_EXPIRE_MINUTES=180

# Azure AD/SSO (Optional)
AZURE_CLIENT_ID=your-azure-client-id
AZURE_TENANT_ID=your-azure-tenant-id
AZURE_REDIRECT_URI=http://localhost:3000/auth/callback
SSO_ENABLED=false
ALLOW_LOCAL_ADMIN=true

# AI Services (Required)
OPENAI_API_KEY=your-openai-api-key
LLAMAPARSE_API_KEY=your-llamaparse-api-key

# Application Settings
DEBUG=true
CORS_ORIGINS=["http://localhost:3000","http://localhost:3002"]
UPLOAD_DIR=./uploads
INDICES_DIR=./indices

# Performance Settings
CACHE_ENABLED=false  # Disabled for development
CACHE_TTL=3600
MAX_DOCUMENT_CHARS=1000000
MAX_SUMMARY_CHARS=100000

Frontend Environment Variables (.env)

VITE_API_URL=http://localhost:8000
VITE_APP_NAME=Contract Analysis Tool

🚀 Getting Started

1. Initialize System

# Create default users (first time only)
curl -X POST http://localhost:8000/api/v1/auth/init-users

2. Default Credentials

  • Admin: admin@oliver.agency / admin123
  • User: user@oliver.agency / user123

3. Basic Workflow

  1. Login at http://localhost:3000/login
  2. Create Index for your document collection
  3. Upload Documents (supports drag-and-drop)
  4. Wait for Processing (documents → chunks → embeddings)
  5. Start Chatting with natural language queries
  6. Review Sources cited in AI responses

📚 API Documentation

Development URLs

Core API Endpoints

Authentication

  • POST /api/v1/auth/login - Local login
  • POST /api/v1/auth/register - User registration
  • GET /api/v1/auth/me - Current user info
  • POST /api/v1/auth/refresh - Token refresh
  • POST /api/v1/auth/sso/validate - SSO login
  • GET /api/v1/auth/sso/config - SSO configuration

Document Management

  • POST /api/v1/documents/upload - Upload documents to index
  • GET /api/v1/documents/index/{index_id} - List documents in index
  • GET /api/v1/documents/{document_id} - Document details
  • DELETE /api/v1/documents/{document_id} - Delete document
  • GET /api/v1/documents/{document_id}/summary - Contract summary
  • POST /api/v1/documents/{document_id}/summary/reprocess - Regenerate summary

Index Management

  • POST /api/v1/indices/create - Create document index
  • GET /api/v1/indices/ - List user indices

Chat System

  • POST /api/v1/chat/query - RAG query with context
  • GET /api/v1/chat/history/{index_id} - Conversation history
  • DELETE /api/v1/chat/history/{index_id} - Clear chat history

Admin Operations

  • GET /api/v1/admin/stats - System statistics
  • POST /api/v1/admin/documents/upload-single - Single document upload
  • POST /api/v1/admin/documents/upload-multiple - Batch upload
  • POST /api/v1/admin/documents/{document_id}/reprocess - Reprocess document
  • GET /api/v1/admin/indices - All system indices
  • POST /api/v1/admin/chat/query - Admin RAG interface

🔧 Advanced Features

Context-Aware Chat System

  • Conversation Memory: AI remembers last 10 messages within 24 hours
  • Smart Context Usage: Follow-up questions reference previous conversation
  • Context Indicators: UI shows when AI uses conversation history
  • Session Statistics: Track response times and context usage

Document Processing Pipeline

  1. Upload: Drag-and-drop or browse files
  2. Validation: File type, size, and content checks
  3. Processing: LlamaIndex parsing and chunking
  4. Embedding: OpenAI embeddings generation
  5. Storage: ChromaDB vector storage + MongoDB metadata
  6. Indexing: Ready for semantic search

SSO Integration (Optional)

  • Azure AD Support: Enterprise authentication
  • Local Fallback: Admin accounts always available
  • Role Mapping: Automatic role assignment from SSO claims
  • Session Management: Unified token handling

Advanced Caching Strategy

  • Smart Cache Logic: Context-dependent queries bypass cache
  • Simple Query Cache: Repeated questions served from Redis
  • TTL Management: Configurable cache expiration
  • Cache Statistics: Monitor hit rates and performance

🏗️ Project Structure

Backend (/backend)

app/
├── main.py              # FastAPI application entry point
├── config/
│   ├── settings.py      # Environment configuration
│   └── database.py      # Database connections
├── api/v1/              # API endpoints
│   ├── auth.py         # Authentication routes
│   ├── documents.py    # Document management
│   ├── indices.py      # Index operations
│   ├── chat.py         # Chat/RAG system
│   └── admin.py        # Admin operations
├── models/              # Data models
│   ├── user.py         # User models
│   ├── document.py     # Document models
│   ├── index.py        # Index models
│   ├── chat.py         # Chat models
│   └── contract_summary.py # Summary models
├── services/            # Business logic
│   ├── document_processor.py    # Document processing
│   ├── rag_service.py          # RAG implementation
│   ├── chat_context_service.py # Context management
│   ├── contract_summary_service.py # Summarization
│   └── sso_service.py          # SSO integration
├── core/                # Core utilities
│   ├── auth.py         # Authentication logic
│   ├── security.py     # Security utilities
│   ├── cache.py        # Caching logic
│   └── chroma_client.py # ChromaDB client
└── utils/               # Helper utilities
    └── file_utils.py    # File operations

Frontend (/frontend)

src/
├── App.jsx             # Main application with routing
├── pages/              # Page components
│   ├── HomePage.jsx    # Landing page
│   ├── Dashboard.jsx   # Main dashboard
│   ├── DocumentManager.jsx # Document management
│   ├── ChatInterface.jsx   # Chat interface
│   └── AdminPanel.jsx      # Admin interface
├── components/         # Reusable components
│   ├── auth/          # Authentication components
│   ├── admin/         # Admin-specific components
│   ├── chat/          # Chat components
│   ├── documents/     # Document components
│   ├── indices/       # Index components
│   └── common/        # Shared components
├── services/          # API service layer
│   ├── authService.js    # Authentication API
│   ├── documentService.js # Document API
│   ├── chatService.js    # Chat API
│   ├── indexService.js   # Index API
│   └── adminService.js   # Admin API
├── context/           # React context providers
│   └── AuthContext.jsx  # Authentication context
└── utils/             # Frontend utilities
    └── constants.js      # Application constants

🧪 Development & Testing

Backend Testing

cd backend
source venv/bin/activate

# API testing via Swagger
# Visit http://localhost:8000/docs

# Manual testing
python test_chat_fixes.py

Frontend Development

cd frontend

# Development server with hot reload
npm run dev

# Build for production
npm run build

# Preview production build
npm run preview

# Lint code
npm run lint

Database Management

# View MongoDB collections
mongo contract_analysis

# View Redis cache
redis-cli
> KEYS *

# Clear ChromaDB indices
rm -rf backend/indices/chroma_db/

📊 Monitoring & Health Checks

System Health

# Backend health
curl http://localhost:8000/health

# Database connectivity
curl http://localhost:8000/api/v1/admin/stats \
  -H "Authorization: Bearer <admin-token>"

Performance Monitoring

  • Response Times: Tracked per request with X-Process-Time header
  • Cache Hit Rates: Monitor Redis performance
  • Document Processing: Track success/failure rates
  • Vector Search: Monitor ChromaDB query performance

🐛 Troubleshooting

Common Issues

1. MongoDB Connection Issues

# Check if MongoDB is running
brew services list | grep mongodb
brew services start mongodb-community

# Check connection string in .env
MONGODB_URL=mongodb://localhost:27017

2. Redis Connection Issues

# Redis is optional - app continues without caching
brew services start redis

# Test Redis connection
redis-cli ping

3. Document Processing Failures

  • Check OpenAI API key validity
  • Verify file format support
  • Review file size limits (50MB default)
  • Check upload directory permissions

4. ChromaDB Issues

# Clear and reinitialize indices
rm -rf backend/indices/chroma_db/
# Restart backend to recreate

5. Frontend Build Issues

cd frontend
rm -rf node_modules package-lock.json
npm install
npm run dev

Log Analysis

  • Backend: Console output from uvicorn
  • Frontend: Browser developer console
  • Database: MongoDB logs in system logs
  • Processing: Check document status in MongoDB

🚀 Production Deployment

Production Checklist

  • Set strong JWT secret key
  • Configure production database URLs
  • Enable HTTPS with SSL certificates
  • Set up reverse proxy (Nginx/Apache)
  • Configure monitoring and logging
  • Set up regular MongoDB backups
  • Disable debug mode
  • Configure proper CORS origins
  • Set up log rotation

Docker Production

# Use production compose file
docker-compose -f docker-compose.prod.yml up -d

# Or build custom images
docker build -t contract-analysis-backend ./backend
docker build -t contract-analysis-frontend ./frontend

Environment Variables for Production

DEBUG=false
JWT_SECRET_KEY=your-ultra-secure-secret-key-here
CORS_ORIGINS=["https://yourdomain.com"]
CACHE_ENABLED=true

🔐 Security Considerations

Implemented Security

  • JWT Authentication with configurable expiration
  • Role-based Authorization (Admin/User)
  • Input Validation with Pydantic schemas
  • File Upload Validation (type, size, content)
  • CORS Protection with configurable origins
  • Environment Variable Protection
  • SQL Injection Prevention (NoSQL with validation)
  • XSS Prevention (React built-in protection)

Best Practices

  • Regularly rotate JWT secret keys
  • Use HTTPS in production
  • Keep dependencies updated
  • Monitor for security vulnerabilities
  • Implement rate limiting for production
  • Regular security audits

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Guidelines

  • Follow existing code style and conventions
  • Add tests for new features
  • Update documentation as needed
  • Ensure all tests pass before submitting PR

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • OpenAI - GPT-4 and embedding models
  • LlamaIndex - RAG framework and document processing
  • ChromaDB - Vector database for semantic search
  • FastAPI - Modern Python web framework
  • React - Frontend framework
  • Tailwind CSS - Utility-first CSS framework
  • Vite - Fast frontend build tool

Built with ❤️ for intelligent contract analysis and document Q&A

For detailed migration information from v1.0, see MIGRATION_PLAN.md For API testing guidance, see API_TESTING_GUIDE.md