529 lines
No EOL
16 KiB
Markdown
529 lines
No EOL
16 KiB
Markdown
# Contract Analysis Tool v2.0
|
|
|
|
A modern, production-ready Retrieval-Augmented Generation (RAG) application for intelligent contract analysis and document Q&A. Built with FastAPI backend and React frontend with advanced features including SSO integration, context-aware chat, and comprehensive document processing.
|
|
|
|

|
|

|
|

|
|

|
|

|
|
|
|
## 🚀 Key Features
|
|
|
|
### Core Functionality
|
|
- **Modern Architecture**: FastAPI + React + MongoDB + Redis + ChromaDB
|
|
- **AI-Powered Analysis**: OpenAI GPT-4 integration with contract summarization
|
|
- **Advanced RAG System**: Context-aware document Q&A with source citations
|
|
- **Document Processing**: Multi-format support (PDF, DOCX, DOC, TXT, CSV, JSON, HTML, MD, RTF)
|
|
- **Vector Search**: ChromaDB for semantic similarity search
|
|
|
|
### Authentication & Security
|
|
- **Dual Authentication**: Local JWT + Azure AD/SSO integration
|
|
- **Role-Based Access Control**: Admin/User permissions
|
|
- **JWT Token Management**: Automatic refresh with 3-hour expiration
|
|
- **Secure File Upload**: Validation, sanitization, and size limits
|
|
|
|
### Advanced Chat System
|
|
- **Context-Aware Conversations**: 24-hour rolling context window (max 10 messages)
|
|
- **Smart Caching**: Context-dependent responses aren't cached, simple queries are
|
|
- **Real-time Statistics**: Response times, cache hit rates, message counts
|
|
- **Proper Message Ordering**: Chronological display with accurate timestamps
|
|
- **Source Citations**: Direct references to document chunks in responses
|
|
|
|
### Document Management
|
|
- **Batch Processing**: Multiple document uploads with progress tracking
|
|
- **Index Organization**: Create themed document collections
|
|
- **Processing Pipeline**: PDF parsing → chunking → embedding → vector storage
|
|
- **Status Tracking**: Real-time processing and embedding status
|
|
- **Contract Summaries**: Automated contract analysis and key point extraction
|
|
|
|
### Admin Features
|
|
- **System Statistics**: Monitor usage, performance, and system health
|
|
- **User Management**: Create, edit, and manage user accounts
|
|
- **Document Reprocessing**: Retry failed documents
|
|
- **Index Management**: Create and manage document indices
|
|
- **Advanced RAG Interface**: Admin-specific query tools
|
|
|
|
## 🏗️ Architecture
|
|
|
|
```
|
|
React Frontend (Vite + Tailwind) → FastAPI Backend → MongoDB + ChromaDB → OpenAI API
|
|
↓
|
|
Redis Cache
|
|
↓
|
|
Azure AD/SSO (Optional)
|
|
```
|
|
|
|
**Data Flow:**
|
|
1. Documents uploaded through React frontend
|
|
2. FastAPI processes with LlamaIndex (chunking, parsing)
|
|
3. OpenAI embeddings stored in ChromaDB
|
|
4. Metadata and user data in MongoDB
|
|
5. RAG queries combine vector search + GPT-4 generation
|
|
6. Redis caches responses for performance
|
|
|
|
## 📋 Prerequisites
|
|
|
|
- **Python 3.11+** (Backend)
|
|
- **Node.js 18+** (Frontend)
|
|
- **MongoDB 7+** (Document metadata)
|
|
- **Redis 7+** (Caching - optional)
|
|
- **OpenAI API Key** (Required)
|
|
- **LlamaParse API Key** (Optional - enhanced PDF processing)
|
|
|
|
## 🛠️ Quick Start
|
|
|
|
### Option 1: Docker (Recommended)
|
|
|
|
```bash
|
|
# Clone repository
|
|
git clone <repository-url>
|
|
cd llama-contracts-master
|
|
|
|
# Configure environment
|
|
cp backend/.env.example backend/.env
|
|
# Edit backend/.env with your API keys
|
|
|
|
# Start backend services
|
|
cd backend
|
|
docker-compose up -d
|
|
|
|
# Start frontend
|
|
cd ../frontend
|
|
npm install
|
|
npm run dev
|
|
```
|
|
|
|
### Option 2: Manual Development Setup
|
|
|
|
#### Backend Setup
|
|
```bash
|
|
cd backend
|
|
python3 -m venv venv
|
|
source venv/bin/activate # Windows: venv\Scripts\activate
|
|
pip install -r requirements.txt
|
|
|
|
# Configure environment
|
|
cp .env.example .env
|
|
# Edit .env file with your settings
|
|
|
|
# Start services (macOS with Homebrew)
|
|
brew services start mongodb-community
|
|
brew services start redis
|
|
|
|
# Initialize database and start server
|
|
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
|
|
```
|
|
|
|
#### Frontend Setup
|
|
```bash
|
|
cd frontend
|
|
npm install
|
|
|
|
# Configure environment
|
|
cp .env.example .env
|
|
# Edit frontend/.env
|
|
|
|
# Start development server
|
|
npm run dev
|
|
```
|
|
|
|
## ⚙️ Configuration
|
|
|
|
### Backend Environment Variables (.env)
|
|
```env
|
|
# Database Configuration
|
|
MONGODB_URL=mongodb://localhost:27017
|
|
DATABASE_NAME=contract_analysis
|
|
|
|
# Redis Cache (Optional)
|
|
REDIS_URL=redis://localhost:6379
|
|
|
|
# Authentication
|
|
JWT_SECRET_KEY=your-super-secret-jwt-key-change-in-production
|
|
JWT_ALGORITHM=HS256
|
|
JWT_EXPIRE_MINUTES=180
|
|
|
|
# Azure AD/SSO (Optional)
|
|
AZURE_CLIENT_ID=your-azure-client-id
|
|
AZURE_TENANT_ID=your-azure-tenant-id
|
|
AZURE_REDIRECT_URI=http://localhost:3000/auth/callback
|
|
SSO_ENABLED=false
|
|
ALLOW_LOCAL_ADMIN=true
|
|
|
|
# AI Services (Required)
|
|
OPENAI_API_KEY=your-openai-api-key
|
|
LLAMAPARSE_API_KEY=your-llamaparse-api-key
|
|
|
|
# Application Settings
|
|
DEBUG=true
|
|
CORS_ORIGINS=["http://localhost:3000","http://localhost:3002"]
|
|
UPLOAD_DIR=./uploads
|
|
INDICES_DIR=./indices
|
|
|
|
# Performance Settings
|
|
CACHE_ENABLED=false # Disabled for development
|
|
CACHE_TTL=3600
|
|
MAX_DOCUMENT_CHARS=1000000
|
|
MAX_SUMMARY_CHARS=100000
|
|
```
|
|
|
|
### Frontend Environment Variables (.env)
|
|
```env
|
|
VITE_API_URL=http://localhost:8000
|
|
VITE_APP_NAME=Contract Analysis Tool
|
|
```
|
|
|
|
## 🚀 Getting Started
|
|
|
|
### 1. Initialize System
|
|
```bash
|
|
# Create default users (first time only)
|
|
curl -X POST http://localhost:8000/api/v1/auth/init-users
|
|
```
|
|
|
|
### 2. Default Credentials
|
|
- **Admin**: `admin@oliver.agency` / `admin123`
|
|
- **User**: `user@oliver.agency` / `user123`
|
|
|
|
### 3. Basic Workflow
|
|
1. **Login** at http://localhost:3000/login
|
|
2. **Create Index** for your document collection
|
|
3. **Upload Documents** (supports drag-and-drop)
|
|
4. **Wait for Processing** (documents → chunks → embeddings)
|
|
5. **Start Chatting** with natural language queries
|
|
6. **Review Sources** cited in AI responses
|
|
|
|
## 📚 API Documentation
|
|
|
|
### Development URLs
|
|
- **Application**: http://localhost:3000
|
|
- **API Docs**: http://localhost:8000/docs
|
|
- **ReDoc**: http://localhost:8000/redoc
|
|
- **Health Check**: http://localhost:8000/health
|
|
|
|
### Core API Endpoints
|
|
|
|
#### Authentication
|
|
- `POST /api/v1/auth/login` - Local login
|
|
- `POST /api/v1/auth/register` - User registration
|
|
- `GET /api/v1/auth/me` - Current user info
|
|
- `POST /api/v1/auth/refresh` - Token refresh
|
|
- `POST /api/v1/auth/sso/validate` - SSO login
|
|
- `GET /api/v1/auth/sso/config` - SSO configuration
|
|
|
|
#### Document Management
|
|
- `POST /api/v1/documents/upload` - Upload documents to index
|
|
- `GET /api/v1/documents/index/{index_id}` - List documents in index
|
|
- `GET /api/v1/documents/{document_id}` - Document details
|
|
- `DELETE /api/v1/documents/{document_id}` - Delete document
|
|
- `GET /api/v1/documents/{document_id}/summary` - Contract summary
|
|
- `POST /api/v1/documents/{document_id}/summary/reprocess` - Regenerate summary
|
|
|
|
#### Index Management
|
|
- `POST /api/v1/indices/create` - Create document index
|
|
- `GET /api/v1/indices/` - List user indices
|
|
|
|
#### Chat System
|
|
- `POST /api/v1/chat/query` - RAG query with context
|
|
- `GET /api/v1/chat/history/{index_id}` - Conversation history
|
|
- `DELETE /api/v1/chat/history/{index_id}` - Clear chat history
|
|
|
|
#### Admin Operations
|
|
- `GET /api/v1/admin/stats` - System statistics
|
|
- `POST /api/v1/admin/documents/upload-single` - Single document upload
|
|
- `POST /api/v1/admin/documents/upload-multiple` - Batch upload
|
|
- `POST /api/v1/admin/documents/{document_id}/reprocess` - Reprocess document
|
|
- `GET /api/v1/admin/indices` - All system indices
|
|
- `POST /api/v1/admin/chat/query` - Admin RAG interface
|
|
|
|
## 🔧 Advanced Features
|
|
|
|
### Context-Aware Chat System
|
|
- **Conversation Memory**: AI remembers last 10 messages within 24 hours
|
|
- **Smart Context Usage**: Follow-up questions reference previous conversation
|
|
- **Context Indicators**: UI shows when AI uses conversation history
|
|
- **Session Statistics**: Track response times and context usage
|
|
|
|
### Document Processing Pipeline
|
|
1. **Upload**: Drag-and-drop or browse files
|
|
2. **Validation**: File type, size, and content checks
|
|
3. **Processing**: LlamaIndex parsing and chunking
|
|
4. **Embedding**: OpenAI embeddings generation
|
|
5. **Storage**: ChromaDB vector storage + MongoDB metadata
|
|
6. **Indexing**: Ready for semantic search
|
|
|
|
### SSO Integration (Optional)
|
|
- **Azure AD Support**: Enterprise authentication
|
|
- **Local Fallback**: Admin accounts always available
|
|
- **Role Mapping**: Automatic role assignment from SSO claims
|
|
- **Session Management**: Unified token handling
|
|
|
|
### Advanced Caching Strategy
|
|
- **Smart Cache Logic**: Context-dependent queries bypass cache
|
|
- **Simple Query Cache**: Repeated questions served from Redis
|
|
- **TTL Management**: Configurable cache expiration
|
|
- **Cache Statistics**: Monitor hit rates and performance
|
|
|
|
## 🏗️ Project Structure
|
|
|
|
### Backend (`/backend`)
|
|
```
|
|
app/
|
|
├── main.py # FastAPI application entry point
|
|
├── config/
|
|
│ ├── settings.py # Environment configuration
|
|
│ └── database.py # Database connections
|
|
├── api/v1/ # API endpoints
|
|
│ ├── auth.py # Authentication routes
|
|
│ ├── documents.py # Document management
|
|
│ ├── indices.py # Index operations
|
|
│ ├── chat.py # Chat/RAG system
|
|
│ └── admin.py # Admin operations
|
|
├── models/ # Data models
|
|
│ ├── user.py # User models
|
|
│ ├── document.py # Document models
|
|
│ ├── index.py # Index models
|
|
│ ├── chat.py # Chat models
|
|
│ └── contract_summary.py # Summary models
|
|
├── services/ # Business logic
|
|
│ ├── document_processor.py # Document processing
|
|
│ ├── rag_service.py # RAG implementation
|
|
│ ├── chat_context_service.py # Context management
|
|
│ ├── contract_summary_service.py # Summarization
|
|
│ └── sso_service.py # SSO integration
|
|
├── core/ # Core utilities
|
|
│ ├── auth.py # Authentication logic
|
|
│ ├── security.py # Security utilities
|
|
│ ├── cache.py # Caching logic
|
|
│ └── chroma_client.py # ChromaDB client
|
|
└── utils/ # Helper utilities
|
|
└── file_utils.py # File operations
|
|
```
|
|
|
|
### Frontend (`/frontend`)
|
|
```
|
|
src/
|
|
├── App.jsx # Main application with routing
|
|
├── pages/ # Page components
|
|
│ ├── HomePage.jsx # Landing page
|
|
│ ├── Dashboard.jsx # Main dashboard
|
|
│ ├── DocumentManager.jsx # Document management
|
|
│ ├── ChatInterface.jsx # Chat interface
|
|
│ └── AdminPanel.jsx # Admin interface
|
|
├── components/ # Reusable components
|
|
│ ├── auth/ # Authentication components
|
|
│ ├── admin/ # Admin-specific components
|
|
│ ├── chat/ # Chat components
|
|
│ ├── documents/ # Document components
|
|
│ ├── indices/ # Index components
|
|
│ └── common/ # Shared components
|
|
├── services/ # API service layer
|
|
│ ├── authService.js # Authentication API
|
|
│ ├── documentService.js # Document API
|
|
│ ├── chatService.js # Chat API
|
|
│ ├── indexService.js # Index API
|
|
│ └── adminService.js # Admin API
|
|
├── context/ # React context providers
|
|
│ └── AuthContext.jsx # Authentication context
|
|
└── utils/ # Frontend utilities
|
|
└── constants.js # Application constants
|
|
```
|
|
|
|
## 🧪 Development & Testing
|
|
|
|
### Backend Testing
|
|
```bash
|
|
cd backend
|
|
source venv/bin/activate
|
|
|
|
# API testing via Swagger
|
|
# Visit http://localhost:8000/docs
|
|
|
|
# Manual testing
|
|
python test_chat_fixes.py
|
|
```
|
|
|
|
### Frontend Development
|
|
```bash
|
|
cd frontend
|
|
|
|
# Development server with hot reload
|
|
npm run dev
|
|
|
|
# Build for production
|
|
npm run build
|
|
|
|
# Preview production build
|
|
npm run preview
|
|
|
|
# Lint code
|
|
npm run lint
|
|
```
|
|
|
|
### Database Management
|
|
```bash
|
|
# View MongoDB collections
|
|
mongo contract_analysis
|
|
|
|
# View Redis cache
|
|
redis-cli
|
|
> KEYS *
|
|
|
|
# Clear ChromaDB indices
|
|
rm -rf backend/indices/chroma_db/
|
|
```
|
|
|
|
## 📊 Monitoring & Health Checks
|
|
|
|
### System Health
|
|
```bash
|
|
# Backend health
|
|
curl http://localhost:8000/health
|
|
|
|
# Database connectivity
|
|
curl http://localhost:8000/api/v1/admin/stats \
|
|
-H "Authorization: Bearer <admin-token>"
|
|
```
|
|
|
|
### Performance Monitoring
|
|
- **Response Times**: Tracked per request with `X-Process-Time` header
|
|
- **Cache Hit Rates**: Monitor Redis performance
|
|
- **Document Processing**: Track success/failure rates
|
|
- **Vector Search**: Monitor ChromaDB query performance
|
|
|
|
## 🐛 Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**1. MongoDB Connection Issues**
|
|
```bash
|
|
# Check if MongoDB is running
|
|
brew services list | grep mongodb
|
|
brew services start mongodb-community
|
|
|
|
# Check connection string in .env
|
|
MONGODB_URL=mongodb://localhost:27017
|
|
```
|
|
|
|
**2. Redis Connection Issues**
|
|
```bash
|
|
# Redis is optional - app continues without caching
|
|
brew services start redis
|
|
|
|
# Test Redis connection
|
|
redis-cli ping
|
|
```
|
|
|
|
**3. Document Processing Failures**
|
|
- Check OpenAI API key validity
|
|
- Verify file format support
|
|
- Review file size limits (50MB default)
|
|
- Check upload directory permissions
|
|
|
|
**4. ChromaDB Issues**
|
|
```bash
|
|
# Clear and reinitialize indices
|
|
rm -rf backend/indices/chroma_db/
|
|
# Restart backend to recreate
|
|
```
|
|
|
|
**5. Frontend Build Issues**
|
|
```bash
|
|
cd frontend
|
|
rm -rf node_modules package-lock.json
|
|
npm install
|
|
npm run dev
|
|
```
|
|
|
|
### Log Analysis
|
|
- **Backend**: Console output from uvicorn
|
|
- **Frontend**: Browser developer console
|
|
- **Database**: MongoDB logs in system logs
|
|
- **Processing**: Check document status in MongoDB
|
|
|
|
## 🚀 Production Deployment
|
|
|
|
### Production Checklist
|
|
- [ ] Set strong JWT secret key
|
|
- [ ] Configure production database URLs
|
|
- [ ] Enable HTTPS with SSL certificates
|
|
- [ ] Set up reverse proxy (Nginx/Apache)
|
|
- [ ] Configure monitoring and logging
|
|
- [ ] Set up regular MongoDB backups
|
|
- [ ] Disable debug mode
|
|
- [ ] Configure proper CORS origins
|
|
- [ ] Set up log rotation
|
|
|
|
### Docker Production
|
|
```bash
|
|
# Use production compose file
|
|
docker-compose -f docker-compose.prod.yml up -d
|
|
|
|
# Or build custom images
|
|
docker build -t contract-analysis-backend ./backend
|
|
docker build -t contract-analysis-frontend ./frontend
|
|
```
|
|
|
|
### Environment Variables for Production
|
|
```env
|
|
DEBUG=false
|
|
JWT_SECRET_KEY=your-ultra-secure-secret-key-here
|
|
CORS_ORIGINS=["https://yourdomain.com"]
|
|
CACHE_ENABLED=true
|
|
```
|
|
|
|
## 🔐 Security Considerations
|
|
|
|
### Implemented Security
|
|
- **JWT Authentication** with configurable expiration
|
|
- **Role-based Authorization** (Admin/User)
|
|
- **Input Validation** with Pydantic schemas
|
|
- **File Upload Validation** (type, size, content)
|
|
- **CORS Protection** with configurable origins
|
|
- **Environment Variable Protection**
|
|
- **SQL Injection Prevention** (NoSQL with validation)
|
|
- **XSS Prevention** (React built-in protection)
|
|
|
|
### Best Practices
|
|
- Regularly rotate JWT secret keys
|
|
- Use HTTPS in production
|
|
- Keep dependencies updated
|
|
- Monitor for security vulnerabilities
|
|
- Implement rate limiting for production
|
|
- Regular security audits
|
|
|
|
## 🤝 Contributing
|
|
|
|
1. **Fork** the repository
|
|
2. **Create** a feature branch (`git checkout -b feature/amazing-feature`)
|
|
3. **Commit** changes (`git commit -m 'Add amazing feature'`)
|
|
4. **Push** to branch (`git push origin feature/amazing-feature`)
|
|
5. **Open** a Pull Request
|
|
|
|
### Development Guidelines
|
|
- Follow existing code style and conventions
|
|
- Add tests for new features
|
|
- Update documentation as needed
|
|
- Ensure all tests pass before submitting PR
|
|
|
|
## 📄 License
|
|
|
|
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
|
|
|
## 🙏 Acknowledgments
|
|
|
|
- **[OpenAI](https://openai.com/)** - GPT-4 and embedding models
|
|
- **[LlamaIndex](https://www.llamaindex.ai/)** - RAG framework and document processing
|
|
- **[ChromaDB](https://www.trychroma.com/)** - Vector database for semantic search
|
|
- **[FastAPI](https://fastapi.tiangolo.com/)** - Modern Python web framework
|
|
- **[React](https://react.dev/)** - Frontend framework
|
|
- **[Tailwind CSS](https://tailwindcss.com/)** - Utility-first CSS framework
|
|
- **[Vite](https://vitejs.dev/)** - Fast frontend build tool
|
|
|
|
---
|
|
|
|
**Built with ❤️ for intelligent contract analysis and document Q&A**
|
|
|
|
*For detailed migration information from v1.0, see [MIGRATION_PLAN.md](MIGRATION_PLAN.md)*
|
|
*For API testing guidance, see [API_TESTING_GUIDE.md](API_TESTING_GUIDE.md)* |