# Contract Analysis Tool v2.0 A modern, production-ready Retrieval-Augmented Generation (RAG) application for intelligent contract analysis and document Q&A. Built with FastAPI backend and React frontend with advanced features including SSO integration, context-aware chat, and comprehensive document processing. ![Architecture](https://img.shields.io/badge/Backend-FastAPI-009688) ![Frontend](https://img.shields.io/badge/Frontend-React-61DAFB) ![Database](https://img.shields.io/badge/Database-MongoDB-47A248) ![Cache](https://img.shields.io/badge/Cache-Redis-DC382D) ![Vector Store](https://img.shields.io/badge/VectorDB-ChromaDB-FF6B35) ## ๐Ÿš€ Key Features ### Core Functionality - **Modern Architecture**: FastAPI + React + MongoDB + Redis + ChromaDB - **AI-Powered Analysis**: OpenAI GPT-4 integration with contract summarization - **Advanced RAG System**: Context-aware document Q&A with source citations - **Document Processing**: Multi-format support (PDF, DOCX, DOC, TXT, CSV, JSON, HTML, MD, RTF) - **Vector Search**: ChromaDB for semantic similarity search ### Authentication & Security - **Dual Authentication**: Local JWT + Azure AD/SSO integration - **Role-Based Access Control**: Admin/User permissions - **JWT Token Management**: Automatic refresh with 3-hour expiration - **Secure File Upload**: Validation, sanitization, and size limits ### Advanced Chat System - **Context-Aware Conversations**: 24-hour rolling context window (max 10 messages) - **Smart Caching**: Context-dependent responses aren't cached, simple queries are - **Real-time Statistics**: Response times, cache hit rates, message counts - **Proper Message Ordering**: Chronological display with accurate timestamps - **Source Citations**: Direct references to document chunks in responses ### Document Management - **Batch Processing**: Multiple document uploads with progress tracking - **Index Organization**: Create themed document collections - **Processing Pipeline**: PDF parsing โ†’ chunking โ†’ embedding โ†’ vector storage - **Status Tracking**: Real-time processing and embedding status - **Contract Summaries**: Automated contract analysis and key point extraction ### Admin Features - **System Statistics**: Monitor usage, performance, and system health - **User Management**: Create, edit, and manage user accounts - **Document Reprocessing**: Retry failed documents - **Index Management**: Create and manage document indices - **Advanced RAG Interface**: Admin-specific query tools ## ๐Ÿ—๏ธ Architecture ``` React Frontend (Vite + Tailwind) โ†’ FastAPI Backend โ†’ MongoDB + ChromaDB โ†’ OpenAI API โ†“ Redis Cache โ†“ Azure AD/SSO (Optional) ``` **Data Flow:** 1. Documents uploaded through React frontend 2. FastAPI processes with LlamaIndex (chunking, parsing) 3. OpenAI embeddings stored in ChromaDB 4. Metadata and user data in MongoDB 5. RAG queries combine vector search + GPT-4 generation 6. Redis caches responses for performance ## ๐Ÿ“‹ Prerequisites - **Python 3.11+** (Backend) - **Node.js 18+** (Frontend) - **MongoDB 7+** (Document metadata) - **Redis 7+** (Caching - optional) - **OpenAI API Key** (Required) - **LlamaParse API Key** (Optional - enhanced PDF processing) ## ๐Ÿ› ๏ธ Quick Start ### Option 1: Docker (Recommended) ```bash # Clone repository git clone cd llama-contracts-master # Configure environment cp backend/.env.example backend/.env # Edit backend/.env with your API keys # Start backend services cd backend docker-compose up -d # Start frontend cd ../frontend npm install npm run dev ``` ### Option 2: Manual Development Setup #### Backend Setup ```bash cd backend python3 -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate pip install -r requirements.txt # Configure environment cp .env.example .env # Edit .env file with your settings # Start services (macOS with Homebrew) brew services start mongodb-community brew services start redis # Initialize database and start server uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 ``` #### Frontend Setup ```bash cd frontend npm install # Configure environment cp .env.example .env # Edit frontend/.env # Start development server npm run dev ``` ## โš™๏ธ Configuration ### Backend Environment Variables (.env) ```env # Database Configuration MONGODB_URL=mongodb://localhost:27017 DATABASE_NAME=contract_analysis # Redis Cache (Optional) REDIS_URL=redis://localhost:6379 # Authentication JWT_SECRET_KEY=your-super-secret-jwt-key-change-in-production JWT_ALGORITHM=HS256 JWT_EXPIRE_MINUTES=180 # Azure AD/SSO (Optional) AZURE_CLIENT_ID=your-azure-client-id AZURE_TENANT_ID=your-azure-tenant-id AZURE_REDIRECT_URI=http://localhost:3000/auth/callback SSO_ENABLED=false ALLOW_LOCAL_ADMIN=true # AI Services (Required) OPENAI_API_KEY=your-openai-api-key LLAMAPARSE_API_KEY=your-llamaparse-api-key # Application Settings DEBUG=true CORS_ORIGINS=["http://localhost:3000","http://localhost:3002"] UPLOAD_DIR=./uploads INDICES_DIR=./indices # Performance Settings CACHE_ENABLED=false # Disabled for development CACHE_TTL=3600 MAX_DOCUMENT_CHARS=1000000 MAX_SUMMARY_CHARS=100000 ``` ### Frontend Environment Variables (.env) ```env VITE_API_URL=http://localhost:8000 VITE_APP_NAME=Contract Analysis Tool ``` ## ๐Ÿš€ Getting Started ### 1. Initialize System ```bash # Create default users (first time only) curl -X POST http://localhost:8000/api/v1/auth/init-users ``` ### 2. Default Credentials - **Admin**: `admin@oliver.agency` / `admin123` - **User**: `user@oliver.agency` / `user123` ### 3. Basic Workflow 1. **Login** at http://localhost:3000/login 2. **Create Index** for your document collection 3. **Upload Documents** (supports drag-and-drop) 4. **Wait for Processing** (documents โ†’ chunks โ†’ embeddings) 5. **Start Chatting** with natural language queries 6. **Review Sources** cited in AI responses ## ๐Ÿ“š API Documentation ### Development URLs - **Application**: http://localhost:3000 - **API Docs**: http://localhost:8000/docs - **ReDoc**: http://localhost:8000/redoc - **Health Check**: http://localhost:8000/health ### Core API Endpoints #### Authentication - `POST /api/v1/auth/login` - Local login - `POST /api/v1/auth/register` - User registration - `GET /api/v1/auth/me` - Current user info - `POST /api/v1/auth/refresh` - Token refresh - `POST /api/v1/auth/sso/validate` - SSO login - `GET /api/v1/auth/sso/config` - SSO configuration #### Document Management - `POST /api/v1/documents/upload` - Upload documents to index - `GET /api/v1/documents/index/{index_id}` - List documents in index - `GET /api/v1/documents/{document_id}` - Document details - `DELETE /api/v1/documents/{document_id}` - Delete document - `GET /api/v1/documents/{document_id}/summary` - Contract summary - `POST /api/v1/documents/{document_id}/summary/reprocess` - Regenerate summary #### Index Management - `POST /api/v1/indices/create` - Create document index - `GET /api/v1/indices/` - List user indices #### Chat System - `POST /api/v1/chat/query` - RAG query with context - `GET /api/v1/chat/history/{index_id}` - Conversation history - `DELETE /api/v1/chat/history/{index_id}` - Clear chat history #### Admin Operations - `GET /api/v1/admin/stats` - System statistics - `POST /api/v1/admin/documents/upload-single` - Single document upload - `POST /api/v1/admin/documents/upload-multiple` - Batch upload - `POST /api/v1/admin/documents/{document_id}/reprocess` - Reprocess document - `GET /api/v1/admin/indices` - All system indices - `POST /api/v1/admin/chat/query` - Admin RAG interface ## ๐Ÿ”ง Advanced Features ### Context-Aware Chat System - **Conversation Memory**: AI remembers last 10 messages within 24 hours - **Smart Context Usage**: Follow-up questions reference previous conversation - **Context Indicators**: UI shows when AI uses conversation history - **Session Statistics**: Track response times and context usage ### Document Processing Pipeline 1. **Upload**: Drag-and-drop or browse files 2. **Validation**: File type, size, and content checks 3. **Processing**: LlamaIndex parsing and chunking 4. **Embedding**: OpenAI embeddings generation 5. **Storage**: ChromaDB vector storage + MongoDB metadata 6. **Indexing**: Ready for semantic search ### SSO Integration (Optional) - **Azure AD Support**: Enterprise authentication - **Local Fallback**: Admin accounts always available - **Role Mapping**: Automatic role assignment from SSO claims - **Session Management**: Unified token handling ### Advanced Caching Strategy - **Smart Cache Logic**: Context-dependent queries bypass cache - **Simple Query Cache**: Repeated questions served from Redis - **TTL Management**: Configurable cache expiration - **Cache Statistics**: Monitor hit rates and performance ## ๐Ÿ—๏ธ Project Structure ### Backend (`/backend`) ``` app/ โ”œโ”€โ”€ main.py # FastAPI application entry point โ”œโ”€โ”€ config/ โ”‚ โ”œโ”€โ”€ settings.py # Environment configuration โ”‚ โ””โ”€โ”€ database.py # Database connections โ”œโ”€โ”€ api/v1/ # API endpoints โ”‚ โ”œโ”€โ”€ auth.py # Authentication routes โ”‚ โ”œโ”€โ”€ documents.py # Document management โ”‚ โ”œโ”€โ”€ indices.py # Index operations โ”‚ โ”œโ”€โ”€ chat.py # Chat/RAG system โ”‚ โ””โ”€โ”€ admin.py # Admin operations โ”œโ”€โ”€ models/ # Data models โ”‚ โ”œโ”€โ”€ user.py # User models โ”‚ โ”œโ”€โ”€ document.py # Document models โ”‚ โ”œโ”€โ”€ index.py # Index models โ”‚ โ”œโ”€โ”€ chat.py # Chat models โ”‚ โ””โ”€โ”€ contract_summary.py # Summary models โ”œโ”€โ”€ services/ # Business logic โ”‚ โ”œโ”€โ”€ document_processor.py # Document processing โ”‚ โ”œโ”€โ”€ rag_service.py # RAG implementation โ”‚ โ”œโ”€โ”€ chat_context_service.py # Context management โ”‚ โ”œโ”€โ”€ contract_summary_service.py # Summarization โ”‚ โ””โ”€โ”€ sso_service.py # SSO integration โ”œโ”€โ”€ core/ # Core utilities โ”‚ โ”œโ”€โ”€ auth.py # Authentication logic โ”‚ โ”œโ”€โ”€ security.py # Security utilities โ”‚ โ”œโ”€โ”€ cache.py # Caching logic โ”‚ โ””โ”€โ”€ chroma_client.py # ChromaDB client โ””โ”€โ”€ utils/ # Helper utilities โ””โ”€โ”€ file_utils.py # File operations ``` ### Frontend (`/frontend`) ``` src/ โ”œโ”€โ”€ App.jsx # Main application with routing โ”œโ”€โ”€ pages/ # Page components โ”‚ โ”œโ”€โ”€ HomePage.jsx # Landing page โ”‚ โ”œโ”€โ”€ Dashboard.jsx # Main dashboard โ”‚ โ”œโ”€โ”€ DocumentManager.jsx # Document management โ”‚ โ”œโ”€โ”€ ChatInterface.jsx # Chat interface โ”‚ โ””โ”€โ”€ AdminPanel.jsx # Admin interface โ”œโ”€โ”€ components/ # Reusable components โ”‚ โ”œโ”€โ”€ auth/ # Authentication components โ”‚ โ”œโ”€โ”€ admin/ # Admin-specific components โ”‚ โ”œโ”€โ”€ chat/ # Chat components โ”‚ โ”œโ”€โ”€ documents/ # Document components โ”‚ โ”œโ”€โ”€ indices/ # Index components โ”‚ โ””โ”€โ”€ common/ # Shared components โ”œโ”€โ”€ services/ # API service layer โ”‚ โ”œโ”€โ”€ authService.js # Authentication API โ”‚ โ”œโ”€โ”€ documentService.js # Document API โ”‚ โ”œโ”€โ”€ chatService.js # Chat API โ”‚ โ”œโ”€โ”€ indexService.js # Index API โ”‚ โ””โ”€โ”€ adminService.js # Admin API โ”œโ”€โ”€ context/ # React context providers โ”‚ โ””โ”€โ”€ AuthContext.jsx # Authentication context โ””โ”€โ”€ utils/ # Frontend utilities โ””โ”€โ”€ constants.js # Application constants ``` ## ๐Ÿงช Development & Testing ### Backend Testing ```bash cd backend source venv/bin/activate # API testing via Swagger # Visit http://localhost:8000/docs # Manual testing python test_chat_fixes.py ``` ### Frontend Development ```bash cd frontend # Development server with hot reload npm run dev # Build for production npm run build # Preview production build npm run preview # Lint code npm run lint ``` ### Database Management ```bash # View MongoDB collections mongo contract_analysis # View Redis cache redis-cli > KEYS * # Clear ChromaDB indices rm -rf backend/indices/chroma_db/ ``` ## ๐Ÿ“Š Monitoring & Health Checks ### System Health ```bash # Backend health curl http://localhost:8000/health # Database connectivity curl http://localhost:8000/api/v1/admin/stats \ -H "Authorization: Bearer " ``` ### Performance Monitoring - **Response Times**: Tracked per request with `X-Process-Time` header - **Cache Hit Rates**: Monitor Redis performance - **Document Processing**: Track success/failure rates - **Vector Search**: Monitor ChromaDB query performance ## ๐Ÿ› Troubleshooting ### Common Issues **1. MongoDB Connection Issues** ```bash # Check if MongoDB is running brew services list | grep mongodb brew services start mongodb-community # Check connection string in .env MONGODB_URL=mongodb://localhost:27017 ``` **2. Redis Connection Issues** ```bash # Redis is optional - app continues without caching brew services start redis # Test Redis connection redis-cli ping ``` **3. Document Processing Failures** - Check OpenAI API key validity - Verify file format support - Review file size limits (50MB default) - Check upload directory permissions **4. ChromaDB Issues** ```bash # Clear and reinitialize indices rm -rf backend/indices/chroma_db/ # Restart backend to recreate ``` **5. Frontend Build Issues** ```bash cd frontend rm -rf node_modules package-lock.json npm install npm run dev ``` ### Log Analysis - **Backend**: Console output from uvicorn - **Frontend**: Browser developer console - **Database**: MongoDB logs in system logs - **Processing**: Check document status in MongoDB ## ๐Ÿš€ Production Deployment ### Production Checklist - [ ] Set strong JWT secret key - [ ] Configure production database URLs - [ ] Enable HTTPS with SSL certificates - [ ] Set up reverse proxy (Nginx/Apache) - [ ] Configure monitoring and logging - [ ] Set up regular MongoDB backups - [ ] Disable debug mode - [ ] Configure proper CORS origins - [ ] Set up log rotation ### Docker Production ```bash # Use production compose file docker-compose -f docker-compose.prod.yml up -d # Or build custom images docker build -t contract-analysis-backend ./backend docker build -t contract-analysis-frontend ./frontend ``` ### Environment Variables for Production ```env DEBUG=false JWT_SECRET_KEY=your-ultra-secure-secret-key-here CORS_ORIGINS=["https://yourdomain.com"] CACHE_ENABLED=true ``` ## ๐Ÿ” Security Considerations ### Implemented Security - **JWT Authentication** with configurable expiration - **Role-based Authorization** (Admin/User) - **Input Validation** with Pydantic schemas - **File Upload Validation** (type, size, content) - **CORS Protection** with configurable origins - **Environment Variable Protection** - **SQL Injection Prevention** (NoSQL with validation) - **XSS Prevention** (React built-in protection) ### Best Practices - Regularly rotate JWT secret keys - Use HTTPS in production - Keep dependencies updated - Monitor for security vulnerabilities - Implement rate limiting for production - Regular security audits ## ๐Ÿค Contributing 1. **Fork** the repository 2. **Create** a feature branch (`git checkout -b feature/amazing-feature`) 3. **Commit** changes (`git commit -m 'Add amazing feature'`) 4. **Push** to branch (`git push origin feature/amazing-feature`) 5. **Open** a Pull Request ### Development Guidelines - Follow existing code style and conventions - Add tests for new features - Update documentation as needed - Ensure all tests pass before submitting PR ## ๐Ÿ“„ License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. ## ๐Ÿ™ Acknowledgments - **[OpenAI](https://openai.com/)** - GPT-4 and embedding models - **[LlamaIndex](https://www.llamaindex.ai/)** - RAG framework and document processing - **[ChromaDB](https://www.trychroma.com/)** - Vector database for semantic search - **[FastAPI](https://fastapi.tiangolo.com/)** - Modern Python web framework - **[React](https://react.dev/)** - Frontend framework - **[Tailwind CSS](https://tailwindcss.com/)** - Utility-first CSS framework - **[Vite](https://vitejs.dev/)** - Fast frontend build tool --- **Built with โค๏ธ for intelligent contract analysis and document Q&A** *For detailed migration information from v1.0, see [MIGRATION_PLAN.md](MIGRATION_PLAN.md)* *For API testing guidance, see [API_TESTING_GUIDE.md](API_TESTING_GUIDE.md)*