contract-query/CLAUDE.md
2025-08-14 15:03:33 -05:00

9.5 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is a modern Contract Analysis Tool v2.0 - a production-ready Retrieval-Augmented Generation (RAG) application for intelligent contract analysis and document Q&A. The system consists of a FastAPI backend and React frontend.

Architecture

Stack:

  • Backend: FastAPI + MongoDB + Redis + ChromaDB
  • Frontend: React + Vite + Tailwind CSS
  • AI/ML: OpenAI GPT-4, LlamaIndex, ChromaDB for vector storage
  • Authentication: JWT-based with role-based access control

Data Flow:

React Frontend → FastAPI Backend → MongoDB + ChromaDB → OpenAI API
                      ↓
                 Redis Cache

Development Commands

Backend (FastAPI)

Start development server:

cd backend
source venv/bin/activate  # On Windows: venv\Scripts\activate
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Install dependencies:

cd backend
pip install -r requirements.txt

Database setup:

  • MongoDB runs on port 27017
  • Redis runs on port 6379
  • Application auto-creates collections/indexes on startup

Initialize default users:

curl -X POST http://localhost:8000/api/v1/auth/init-users

Health check:

curl http://localhost:8000/health

Frontend (React)

Start development server:

cd frontend
npm run dev

Build for production:

cd frontend
npm run build

Lint code:

cd frontend
npm run lint

Install dependencies:

cd frontend
npm install

Docker Development

Start all services:

cd backend
docker-compose up -d

Backend only (with external DB):

cd backend
docker-compose up -d mongo redis

Project Structure

Backend (/backend)

  • app/main.py - FastAPI application entry point
  • app/config/settings.py - Environment configuration and database settings
  • app/api/v1/ - API endpoints (auth, documents, indices, chat, admin)
  • app/models/ - MongoDB data models (user, document, index, chat)
  • app/services/ - Business logic (document_processor, rag_service)
  • app/core/ - Core utilities (auth, security, cache)
  • app/utils/ - Helper utilities (file_utils)

Frontend (/frontend)

  • src/App.jsx - Main React application with routing
  • src/pages/ - Page components (Dashboard, DocumentManager, ChatInterface, AdminPanel)
  • src/components/ - Reusable UI components organized by feature
  • src/services/ - API service layer (authService, documentService, chatService, indexService)
  • src/context/ - React context providers (AuthContext)
  • src/utils/ - Frontend utilities and constants

Key Features & Workflows

Authentication System

  • JWT-based authentication with role-based access (admin/user)
  • Default users: admin@oliver.agency/admin123, user@oliver.agency/user123
  • Protected routes with automatic token refresh

Document Processing Pipeline

  1. Upload → Document uploaded via React frontend
  2. Process → Backend processes with LlamaIndex (PDF parsing, chunking)
  3. Index → Embeddings stored in ChromaDB, metadata in MongoDB
  4. Query → Natural language queries via RAG system

Index Management

  • Users can create document indices for organizing documents
  • Role-based access control for index management
  • ChromaDB handles vector storage, MongoDB stores metadata

Chat System

  • Context-Aware Conversations: AI remembers previous 10 messages within 24-hour window
  • Real-time document Q&A using RAG with source citations
  • Proper message ordering - chronological display with correct timestamps
  • Conversation continuity - responses reference previous context when relevant
  • Configurable top-k results for query precision (3, 5, 10, 15)
  • Smart caching - context-dependent responses aren't cached, simple queries are
  • Session statistics - track response times, cache hit rates, message counts

Environment Configuration

Backend (.env)

# Database
MONGODB_URL=mongodb://localhost:27017
DATABASE_NAME=contract_analysis

# Redis
REDIS_URL=redis://localhost:6379

# Authentication
JWT_SECRET_KEY=your-super-secret-jwt-key
JWT_ALGORITHM=HS256
JWT_EXPIRE_MINUTES=30

# OpenAI
OPENAI_API_KEY=your-openai-api-key
LLAMAPARSE_API_KEY=your-llamaparse-api-key

# Application
DEBUG=false
CORS_ORIGINS=["http://localhost:3000"]
UPLOAD_DIR=./uploads
INDICES_DIR=./indices

# Cache
CACHE_ENABLED=true
CACHE_TTL=3600

Frontend (.env)

VITE_API_URL=http://localhost:8000
VITE_APP_NAME=Contract Analysis Tool

API Endpoints

Authentication:

  • POST /api/v1/auth/login - User login
  • POST /api/v1/auth/init-users - Initialize default users

Documents:

  • POST /api/v1/documents/upload - Upload documents to index
  • GET /api/v1/documents/{index_id} - List documents in index

Indices:

  • POST /api/v1/indices/create - Create new document index
  • GET /api/v1/indices/ - List user's indices

Chat:

  • POST /api/v1/chat/query - Query documents with natural language

Admin:

  • GET /api/v1/admin/stats - System statistics (admin only)
  • POST /api/v1/admin/documents/upload-single - Upload single document
  • POST /api/v1/admin/documents/upload-multiple - Upload multiple documents
  • GET /api/v1/admin/documents/{index_id} - Get index documents
  • POST /api/v1/admin/documents/{document_id}/reprocess - Reprocess document
  • DELETE /api/v1/admin/documents/{document_id} - Delete document
  • GET /api/v1/admin/indices - Get all indices
  • POST /api/v1/admin/indices/create - Create new index
  • POST /api/v1/admin/chat/query - RAG query interface

Development Notes

Database Connections

  • MongoDB connection pooling handled automatically
  • Redis connection with fallback if unavailable
  • ChromaDB indices stored in ./indices directory

File Handling

  • Uploads stored in ./uploads/{index_id}/ directory structure
  • Supported formats: PDF, DOCX, DOC, TXT, CSV, JSON, HTML, MD, RTF
  • 50MB file size limit (configurable)
  • Automatic file naming for batch uploads

Caching Strategy

  • Redis caches API responses for performance
  • TTL configurable via CACHE_TTL environment variable
  • Cache keys include user context for security

Document Processing

  • Async processing with database status tracking
  • Processing states: pending → processing → completed/failed
  • Embedding states: pending → processing → completed/failed
  • Automatic retry capability for failed documents
  • Chunk count and vector ID tracking in MongoDB

Vector Storage

  • ChromaDB persistent storage in ./indices/chroma_db/
  • Collections named index_{index_id} for organization
  • Metadata includes document_id, chunk_index, index_id
  • Configurable similarity search with top-k results

Chat Context System

  • Context Window: 24-hour rolling window with max 10 previous messages
  • Smart Context: AI uses conversation history for continuity and follow-up questions
  • Context Caching: Responses with context aren't cached (dynamic), simple queries are cached
  • Database Storage: All messages stored with proper timestamps and context metadata
  • Context Display: Frontend shows when context is used and how many previous messages
  • Session Management: Track conversation statistics and context usage

Message Ordering & Timestamps

  • Chronological Order: Messages displayed in proper time sequence (oldest → newest)
  • Accurate Timestamps: Server-side timestamp generation with UTC storage
  • Separate Timestamps: User and assistant messages have distinct timestamps
  • Proper Database Storage: created_at, user_timestamp, and assistant_timestamp fields
  • Frontend Display: Localized timestamp formatting with date and time
  • Context Indicators: Visual indicators show when AI used previous conversation context

Error Handling & Validation

  • Collection Validation: Check ChromaDB collection exists before querying
  • Document Status Check: Verify documents are fully processed before chat
  • Graceful Degradation: Fallback responses when context generation fails
  • User-Friendly Errors: Clear, actionable error messages with next steps
  • Progress Tracking: Real-time status updates during document processing

Progress Visualization

  • Upload Progress: Real-time progress bars during file uploads
  • Processing Status: Visual indicators for document processing stages
  • Embedding Progress: Separate progress tracking for text processing and embedding
  • Success States: Clear visual feedback when operations complete
  • Status Dashboard: Comprehensive view of document processing pipeline

Security Features

  • JWT token validation on protected routes
  • Input validation with Pydantic schemas
  • CORS configuration for frontend integration
  • File upload validation and sanitization

Testing

Backend Testing

cd backend
# API documentation available at http://localhost:8000/docs
# Manual testing via Swagger UI

Frontend Testing

  • React components use modern hooks patterns
  • Error boundaries for graceful error handling
  • Loading states for better UX

Migration Context

This is a migrated application from PHP/Python to FastAPI/React. The migration maintained:

  • Complete feature parity with the original application
  • All document processing capabilities
  • ChromaDB indices compatibility
  • Enhanced performance and security
  • Modern, responsive UI

The MIGRATION_PLAN.md file contains detailed information about the migration process and architecture decisions.