# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview This is a modern Contract Analysis Tool v2.0 - a production-ready Retrieval-Augmented Generation (RAG) application for intelligent contract analysis and document Q&A. The system consists of a FastAPI backend and React frontend. ## Architecture **Stack:** - **Backend:** FastAPI + MongoDB + Redis + ChromaDB - **Frontend:** React + Vite + Tailwind CSS - **AI/ML:** OpenAI GPT-4, LlamaIndex, ChromaDB for vector storage - **Authentication:** JWT-based with role-based access control **Data Flow:** ``` React Frontend → FastAPI Backend → MongoDB + ChromaDB → OpenAI API ↓ Redis Cache ``` ## Development Commands ### Backend (FastAPI) **Start development server:** ```bash cd backend source venv/bin/activate # On Windows: venv\Scripts\activate uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 ``` **Install dependencies:** ```bash cd backend pip install -r requirements.txt ``` **Database setup:** - MongoDB runs on port 27017 - Redis runs on port 6379 - Application auto-creates collections/indexes on startup **Initialize default users:** ```bash curl -X POST http://localhost:8000/api/v1/auth/init-users ``` **Health check:** ```bash curl http://localhost:8000/health ``` ### Frontend (React) **Start development server:** ```bash cd frontend npm run dev ``` **Build for production:** ```bash cd frontend npm run build ``` **Lint code:** ```bash cd frontend npm run lint ``` **Install dependencies:** ```bash cd frontend npm install ``` ### Docker Development **Start all services:** ```bash cd backend docker-compose up -d ``` **Backend only (with external DB):** ```bash cd backend docker-compose up -d mongo redis ``` ## Project Structure ### Backend (`/backend`) - `app/main.py` - FastAPI application entry point - `app/config/settings.py` - Environment configuration and database settings - `app/api/v1/` - API endpoints (auth, documents, indices, chat, admin) - `app/models/` - MongoDB data models (user, document, index, chat) - `app/services/` - Business logic (document_processor, rag_service) - `app/core/` - Core utilities (auth, security, cache) - `app/utils/` - Helper utilities (file_utils) ### Frontend (`/frontend`) - `src/App.jsx` - Main React application with routing - `src/pages/` - Page components (Dashboard, DocumentManager, ChatInterface, AdminPanel) - `src/components/` - Reusable UI components organized by feature - `src/services/` - API service layer (authService, documentService, chatService, indexService) - `src/context/` - React context providers (AuthContext) - `src/utils/` - Frontend utilities and constants ## Key Features & Workflows ### Authentication System - JWT-based authentication with role-based access (admin/user) - Default users: `admin@oliver.agency`/`admin123`, `user@oliver.agency`/`user123` - Protected routes with automatic token refresh ### Document Processing Pipeline 1. **Upload** → Document uploaded via React frontend 2. **Process** → Backend processes with LlamaIndex (PDF parsing, chunking) 3. **Index** → Embeddings stored in ChromaDB, metadata in MongoDB 4. **Query** → Natural language queries via RAG system ### Index Management - Users can create document indices for organizing documents - Role-based access control for index management - ChromaDB handles vector storage, MongoDB stores metadata ### Chat System - **Context-Aware Conversations**: AI remembers previous 10 messages within 24-hour window - **Real-time document Q&A** using RAG with source citations - **Proper message ordering** - chronological display with correct timestamps - **Conversation continuity** - responses reference previous context when relevant - **Configurable top-k** results for query precision (3, 5, 10, 15) - **Smart caching** - context-dependent responses aren't cached, simple queries are - **Session statistics** - track response times, cache hit rates, message counts ## Environment Configuration ### Backend (`.env`) ```env # Database MONGODB_URL=mongodb://localhost:27017 DATABASE_NAME=contract_analysis # Redis REDIS_URL=redis://localhost:6379 # Authentication JWT_SECRET_KEY=your-super-secret-jwt-key JWT_ALGORITHM=HS256 JWT_EXPIRE_MINUTES=30 # OpenAI OPENAI_API_KEY=your-openai-api-key LLAMAPARSE_API_KEY=your-llamaparse-api-key # Application DEBUG=false CORS_ORIGINS=["http://localhost:3000"] UPLOAD_DIR=./uploads INDICES_DIR=./indices # Cache CACHE_ENABLED=true CACHE_TTL=3600 ``` ### Frontend (`.env`) ```env VITE_API_URL=http://localhost:8000 VITE_APP_NAME=Contract Analysis Tool ``` ## API Endpoints **Authentication:** - `POST /api/v1/auth/login` - User login - `POST /api/v1/auth/init-users` - Initialize default users **Documents:** - `POST /api/v1/documents/upload` - Upload documents to index - `GET /api/v1/documents/{index_id}` - List documents in index **Indices:** - `POST /api/v1/indices/create` - Create new document index - `GET /api/v1/indices/` - List user's indices **Chat:** - `POST /api/v1/chat/query` - Query documents with natural language **Admin:** - `GET /api/v1/admin/stats` - System statistics (admin only) - `POST /api/v1/admin/documents/upload-single` - Upload single document - `POST /api/v1/admin/documents/upload-multiple` - Upload multiple documents - `GET /api/v1/admin/documents/{index_id}` - Get index documents - `POST /api/v1/admin/documents/{document_id}/reprocess` - Reprocess document - `DELETE /api/v1/admin/documents/{document_id}` - Delete document - `GET /api/v1/admin/indices` - Get all indices - `POST /api/v1/admin/indices/create` - Create new index - `POST /api/v1/admin/chat/query` - RAG query interface ## Development Notes ### Database Connections - MongoDB connection pooling handled automatically - Redis connection with fallback if unavailable - ChromaDB indices stored in `./indices` directory ### File Handling - Uploads stored in `./uploads/{index_id}/` directory structure - Supported formats: PDF, DOCX, DOC, TXT, CSV, JSON, HTML, MD, RTF - 50MB file size limit (configurable) - Automatic file naming for batch uploads ### Caching Strategy - Redis caches API responses for performance - TTL configurable via `CACHE_TTL` environment variable - Cache keys include user context for security ### Document Processing - Async processing with database status tracking - Processing states: pending → processing → completed/failed - Embedding states: pending → processing → completed/failed - Automatic retry capability for failed documents - Chunk count and vector ID tracking in MongoDB ### Vector Storage - ChromaDB persistent storage in `./indices/chroma_db/` - Collections named `index_{index_id}` for organization - Metadata includes document_id, chunk_index, index_id - Configurable similarity search with top-k results ### Chat Context System - **Context Window**: 24-hour rolling window with max 10 previous messages - **Smart Context**: AI uses conversation history for continuity and follow-up questions - **Context Caching**: Responses with context aren't cached (dynamic), simple queries are cached - **Database Storage**: All messages stored with proper timestamps and context metadata - **Context Display**: Frontend shows when context is used and how many previous messages - **Session Management**: Track conversation statistics and context usage ### Message Ordering & Timestamps - **Chronological Order**: Messages displayed in proper time sequence (oldest → newest) - **Accurate Timestamps**: Server-side timestamp generation with UTC storage - **Separate Timestamps**: User and assistant messages have distinct timestamps - **Proper Database Storage**: `created_at`, `user_timestamp`, and `assistant_timestamp` fields - **Frontend Display**: Localized timestamp formatting with date and time - **Context Indicators**: Visual indicators show when AI used previous conversation context ### Error Handling & Validation - **Collection Validation**: Check ChromaDB collection exists before querying - **Document Status Check**: Verify documents are fully processed before chat - **Graceful Degradation**: Fallback responses when context generation fails - **User-Friendly Errors**: Clear, actionable error messages with next steps - **Progress Tracking**: Real-time status updates during document processing ### Progress Visualization - **Upload Progress**: Real-time progress bars during file uploads - **Processing Status**: Visual indicators for document processing stages - **Embedding Progress**: Separate progress tracking for text processing and embedding - **Success States**: Clear visual feedback when operations complete - **Status Dashboard**: Comprehensive view of document processing pipeline ### Security Features - JWT token validation on protected routes - Input validation with Pydantic schemas - CORS configuration for frontend integration - File upload validation and sanitization ## Testing ### Backend Testing ```bash cd backend # API documentation available at http://localhost:8000/docs # Manual testing via Swagger UI ``` ### Frontend Testing - React components use modern hooks patterns - Error boundaries for graceful error handling - Loading states for better UX ## Migration Context This is a migrated application from PHP/Python to FastAPI/React. The migration maintained: - Complete feature parity with the original application - All document processing capabilities - ChromaDB indices compatibility - Enhanced performance and security - Modern, responsive UI The `MIGRATION_PLAN.md` file contains detailed information about the migration process and architecture decisions.