16 KiB
16 KiB
Contract Analysis Tool v2.0
A modern, production-ready Retrieval-Augmented Generation (RAG) application for intelligent contract analysis and document Q&A. Built with FastAPI backend and React frontend with advanced features including SSO integration, context-aware chat, and comprehensive document processing.
🚀 Key Features
Core Functionality
- Modern Architecture: FastAPI + React + MongoDB + Redis + ChromaDB
- AI-Powered Analysis: OpenAI GPT-4 integration with contract summarization
- Advanced RAG System: Context-aware document Q&A with source citations
- Document Processing: Multi-format support (PDF, DOCX, DOC, TXT, CSV, JSON, HTML, MD, RTF)
- Vector Search: ChromaDB for semantic similarity search
Authentication & Security
- Dual Authentication: Local JWT + Azure AD/SSO integration
- Role-Based Access Control: Admin/User permissions
- JWT Token Management: Automatic refresh with 3-hour expiration
- Secure File Upload: Validation, sanitization, and size limits
Advanced Chat System
- Context-Aware Conversations: 24-hour rolling context window (max 10 messages)
- Smart Caching: Context-dependent responses aren't cached, simple queries are
- Real-time Statistics: Response times, cache hit rates, message counts
- Proper Message Ordering: Chronological display with accurate timestamps
- Source Citations: Direct references to document chunks in responses
Document Management
- Batch Processing: Multiple document uploads with progress tracking
- Index Organization: Create themed document collections
- Processing Pipeline: PDF parsing → chunking → embedding → vector storage
- Status Tracking: Real-time processing and embedding status
- Contract Summaries: Automated contract analysis and key point extraction
Admin Features
- System Statistics: Monitor usage, performance, and system health
- User Management: Create, edit, and manage user accounts
- Document Reprocessing: Retry failed documents
- Index Management: Create and manage document indices
- Advanced RAG Interface: Admin-specific query tools
🏗️ Architecture
React Frontend (Vite + Tailwind) → FastAPI Backend → MongoDB + ChromaDB → OpenAI API
↓
Redis Cache
↓
Azure AD/SSO (Optional)
Data Flow:
- Documents uploaded through React frontend
- FastAPI processes with LlamaIndex (chunking, parsing)
- OpenAI embeddings stored in ChromaDB
- Metadata and user data in MongoDB
- RAG queries combine vector search + GPT-4 generation
- Redis caches responses for performance
📋 Prerequisites
- Python 3.11+ (Backend)
- Node.js 18+ (Frontend)
- MongoDB 7+ (Document metadata)
- Redis 7+ (Caching - optional)
- OpenAI API Key (Required)
- LlamaParse API Key (Optional - enhanced PDF processing)
🛠️ Quick Start
Option 1: Docker (Recommended)
# Clone repository
git clone <repository-url>
cd llama-contracts-master
# Configure environment
cp backend/.env.example backend/.env
# Edit backend/.env with your API keys
# Start backend services
cd backend
docker-compose up -d
# Start frontend
cd ../frontend
npm install
npm run dev
Option 2: Manual Development Setup
Backend Setup
cd backend
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env file with your settings
# Start services (macOS with Homebrew)
brew services start mongodb-community
brew services start redis
# Initialize database and start server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
Frontend Setup
cd frontend
npm install
# Configure environment
cp .env.example .env
# Edit frontend/.env
# Start development server
npm run dev
⚙️ Configuration
Backend Environment Variables (.env)
# Database Configuration
MONGODB_URL=mongodb://localhost:27017
DATABASE_NAME=contract_analysis
# Redis Cache (Optional)
REDIS_URL=redis://localhost:6379
# Authentication
JWT_SECRET_KEY=your-super-secret-jwt-key-change-in-production
JWT_ALGORITHM=HS256
JWT_EXPIRE_MINUTES=180
# Azure AD/SSO (Optional)
AZURE_CLIENT_ID=your-azure-client-id
AZURE_TENANT_ID=your-azure-tenant-id
AZURE_REDIRECT_URI=http://localhost:3000/auth/callback
SSO_ENABLED=false
ALLOW_LOCAL_ADMIN=true
# AI Services (Required)
OPENAI_API_KEY=your-openai-api-key
LLAMAPARSE_API_KEY=your-llamaparse-api-key
# Application Settings
DEBUG=true
CORS_ORIGINS=["http://localhost:3000","http://localhost:3002"]
UPLOAD_DIR=./uploads
INDICES_DIR=./indices
# Performance Settings
CACHE_ENABLED=false # Disabled for development
CACHE_TTL=3600
MAX_DOCUMENT_CHARS=1000000
MAX_SUMMARY_CHARS=100000
Frontend Environment Variables (.env)
VITE_API_URL=http://localhost:8000
VITE_APP_NAME=Contract Analysis Tool
🚀 Getting Started
1. Initialize System
# Create default users (first time only)
curl -X POST http://localhost:8000/api/v1/auth/init-users
2. Default Credentials
- Admin:
admin@oliver.agency/admin123 - User:
user@oliver.agency/user123
3. Basic Workflow
- Login at http://localhost:3000/login
- Create Index for your document collection
- Upload Documents (supports drag-and-drop)
- Wait for Processing (documents → chunks → embeddings)
- Start Chatting with natural language queries
- Review Sources cited in AI responses
📚 API Documentation
Development URLs
- Application: http://localhost:3000
- API Docs: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
- Health Check: http://localhost:8000/health
Core API Endpoints
Authentication
POST /api/v1/auth/login- Local loginPOST /api/v1/auth/register- User registrationGET /api/v1/auth/me- Current user infoPOST /api/v1/auth/refresh- Token refreshPOST /api/v1/auth/sso/validate- SSO loginGET /api/v1/auth/sso/config- SSO configuration
Document Management
POST /api/v1/documents/upload- Upload documents to indexGET /api/v1/documents/index/{index_id}- List documents in indexGET /api/v1/documents/{document_id}- Document detailsDELETE /api/v1/documents/{document_id}- Delete documentGET /api/v1/documents/{document_id}/summary- Contract summaryPOST /api/v1/documents/{document_id}/summary/reprocess- Regenerate summary
Index Management
POST /api/v1/indices/create- Create document indexGET /api/v1/indices/- List user indices
Chat System
POST /api/v1/chat/query- RAG query with contextGET /api/v1/chat/history/{index_id}- Conversation historyDELETE /api/v1/chat/history/{index_id}- Clear chat history
Admin Operations
GET /api/v1/admin/stats- System statisticsPOST /api/v1/admin/documents/upload-single- Single document uploadPOST /api/v1/admin/documents/upload-multiple- Batch uploadPOST /api/v1/admin/documents/{document_id}/reprocess- Reprocess documentGET /api/v1/admin/indices- All system indicesPOST /api/v1/admin/chat/query- Admin RAG interface
🔧 Advanced Features
Context-Aware Chat System
- Conversation Memory: AI remembers last 10 messages within 24 hours
- Smart Context Usage: Follow-up questions reference previous conversation
- Context Indicators: UI shows when AI uses conversation history
- Session Statistics: Track response times and context usage
Document Processing Pipeline
- Upload: Drag-and-drop or browse files
- Validation: File type, size, and content checks
- Processing: LlamaIndex parsing and chunking
- Embedding: OpenAI embeddings generation
- Storage: ChromaDB vector storage + MongoDB metadata
- Indexing: Ready for semantic search
SSO Integration (Optional)
- Azure AD Support: Enterprise authentication
- Local Fallback: Admin accounts always available
- Role Mapping: Automatic role assignment from SSO claims
- Session Management: Unified token handling
Advanced Caching Strategy
- Smart Cache Logic: Context-dependent queries bypass cache
- Simple Query Cache: Repeated questions served from Redis
- TTL Management: Configurable cache expiration
- Cache Statistics: Monitor hit rates and performance
🏗️ Project Structure
Backend (/backend)
app/
├── main.py # FastAPI application entry point
├── config/
│ ├── settings.py # Environment configuration
│ └── database.py # Database connections
├── api/v1/ # API endpoints
│ ├── auth.py # Authentication routes
│ ├── documents.py # Document management
│ ├── indices.py # Index operations
│ ├── chat.py # Chat/RAG system
│ └── admin.py # Admin operations
├── models/ # Data models
│ ├── user.py # User models
│ ├── document.py # Document models
│ ├── index.py # Index models
│ ├── chat.py # Chat models
│ └── contract_summary.py # Summary models
├── services/ # Business logic
│ ├── document_processor.py # Document processing
│ ├── rag_service.py # RAG implementation
│ ├── chat_context_service.py # Context management
│ ├── contract_summary_service.py # Summarization
│ └── sso_service.py # SSO integration
├── core/ # Core utilities
│ ├── auth.py # Authentication logic
│ ├── security.py # Security utilities
│ ├── cache.py # Caching logic
│ └── chroma_client.py # ChromaDB client
└── utils/ # Helper utilities
└── file_utils.py # File operations
Frontend (/frontend)
src/
├── App.jsx # Main application with routing
├── pages/ # Page components
│ ├── HomePage.jsx # Landing page
│ ├── Dashboard.jsx # Main dashboard
│ ├── DocumentManager.jsx # Document management
│ ├── ChatInterface.jsx # Chat interface
│ └── AdminPanel.jsx # Admin interface
├── components/ # Reusable components
│ ├── auth/ # Authentication components
│ ├── admin/ # Admin-specific components
│ ├── chat/ # Chat components
│ ├── documents/ # Document components
│ ├── indices/ # Index components
│ └── common/ # Shared components
├── services/ # API service layer
│ ├── authService.js # Authentication API
│ ├── documentService.js # Document API
│ ├── chatService.js # Chat API
│ ├── indexService.js # Index API
│ └── adminService.js # Admin API
├── context/ # React context providers
│ └── AuthContext.jsx # Authentication context
└── utils/ # Frontend utilities
└── constants.js # Application constants
🧪 Development & Testing
Backend Testing
cd backend
source venv/bin/activate
# API testing via Swagger
# Visit http://localhost:8000/docs
# Manual testing
python test_chat_fixes.py
Frontend Development
cd frontend
# Development server with hot reload
npm run dev
# Build for production
npm run build
# Preview production build
npm run preview
# Lint code
npm run lint
Database Management
# View MongoDB collections
mongo contract_analysis
# View Redis cache
redis-cli
> KEYS *
# Clear ChromaDB indices
rm -rf backend/indices/chroma_db/
📊 Monitoring & Health Checks
System Health
# Backend health
curl http://localhost:8000/health
# Database connectivity
curl http://localhost:8000/api/v1/admin/stats \
-H "Authorization: Bearer <admin-token>"
Performance Monitoring
- Response Times: Tracked per request with
X-Process-Timeheader - Cache Hit Rates: Monitor Redis performance
- Document Processing: Track success/failure rates
- Vector Search: Monitor ChromaDB query performance
🐛 Troubleshooting
Common Issues
1. MongoDB Connection Issues
# Check if MongoDB is running
brew services list | grep mongodb
brew services start mongodb-community
# Check connection string in .env
MONGODB_URL=mongodb://localhost:27017
2. Redis Connection Issues
# Redis is optional - app continues without caching
brew services start redis
# Test Redis connection
redis-cli ping
3. Document Processing Failures
- Check OpenAI API key validity
- Verify file format support
- Review file size limits (50MB default)
- Check upload directory permissions
4. ChromaDB Issues
# Clear and reinitialize indices
rm -rf backend/indices/chroma_db/
# Restart backend to recreate
5. Frontend Build Issues
cd frontend
rm -rf node_modules package-lock.json
npm install
npm run dev
Log Analysis
- Backend: Console output from uvicorn
- Frontend: Browser developer console
- Database: MongoDB logs in system logs
- Processing: Check document status in MongoDB
🚀 Production Deployment
Production Checklist
- Set strong JWT secret key
- Configure production database URLs
- Enable HTTPS with SSL certificates
- Set up reverse proxy (Nginx/Apache)
- Configure monitoring and logging
- Set up regular MongoDB backups
- Disable debug mode
- Configure proper CORS origins
- Set up log rotation
Docker Production
# Use production compose file
docker-compose -f docker-compose.prod.yml up -d
# Or build custom images
docker build -t contract-analysis-backend ./backend
docker build -t contract-analysis-frontend ./frontend
Environment Variables for Production
DEBUG=false
JWT_SECRET_KEY=your-ultra-secure-secret-key-here
CORS_ORIGINS=["https://yourdomain.com"]
CACHE_ENABLED=true
🔐 Security Considerations
Implemented Security
- JWT Authentication with configurable expiration
- Role-based Authorization (Admin/User)
- Input Validation with Pydantic schemas
- File Upload Validation (type, size, content)
- CORS Protection with configurable origins
- Environment Variable Protection
- SQL Injection Prevention (NoSQL with validation)
- XSS Prevention (React built-in protection)
Best Practices
- Regularly rotate JWT secret keys
- Use HTTPS in production
- Keep dependencies updated
- Monitor for security vulnerabilities
- Implement rate limiting for production
- Regular security audits
🤝 Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development Guidelines
- Follow existing code style and conventions
- Add tests for new features
- Update documentation as needed
- Ensure all tests pass before submitting PR
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- OpenAI - GPT-4 and embedding models
- LlamaIndex - RAG framework and document processing
- ChromaDB - Vector database for semantic search
- FastAPI - Modern Python web framework
- React - Frontend framework
- Tailwind CSS - Utility-first CSS framework
- Vite - Fast frontend build tool
Built with ❤️ for intelligent contract analysis and document Q&A
For detailed migration information from v1.0, see MIGRATION_PLAN.md For API testing guidance, see API_TESTING_GUIDE.md