contract-query/MIGRATION_PLAN.md
2025-08-14 15:03:33 -05:00

10 KiB

Migration Plan: PHP/Python → FastAPI/React

Overview

This document outlines the complete migration strategy for transforming the current PHP/Python hybrid RAG application into a modern FastAPI backend with React frontend architecture.

Current Application Analysis

Existing Features

  • User Authentication: Role-based access (admin/user) with SQLite storage
  • Document Management: File uploads, processing, and indexing
  • RAG System: LlamaIndex + ChromaDB for document retrieval
  • Contract Analysis: GPT-4 powered contract field extraction
  • Chat Interface: Natural language document Q&A
  • Caching System: Response caching for performance
  • Index Management: User-specific access control to document indices

Current Architecture

PHP Frontend (Web UI) → Python Backend (Processing) → OpenAI API
         ↓                        ↓
    SQLite DB              ChromaDB Vectors

New Architecture

Target Architecture

React Frontend → FastAPI Backend → MongoDB + ChromaDB → OpenAI API
                      ↓
                 Redis Cache

Project Structure

Backend (FastAPI)

backend/
├── app/
│   ├── __init__.py
│   ├── main.py                    # FastAPI app entry point
│   ├── config/
│   │   ├── __init__.py
│   │   ├── settings.py            # Environment configuration
│   │   └── database.py            # MongoDB connection
│   ├── models/
│   │   ├── __init__.py
│   │   ├── user.py               # User data models
│   │   ├── document.py           # Document models
│   │   ├── index.py              # Index models
│   │   └── chat.py               # Chat/query models
│   ├── schemas/
│   │   ├── __init__.py
│   │   ├── user.py               # Pydantic schemas
│   │   ├── document.py
│   │   ├── index.py
│   │   └── chat.py
│   ├── api/
│   │   ├── __init__.py
│   │   ├── deps.py               # Dependencies
│   │   └── v1/
│   │       ├── __init__.py
│   │       ├── auth.py           # Authentication endpoints
│   │       ├── documents.py      # Document management
│   │       ├── indices.py        # Index management
│   │       ├── chat.py           # Chat/query endpoints
│   │       └── admin.py          # Admin endpoints
│   ├── core/
│   │   ├── __init__.py
│   │   ├── auth.py               # JWT authentication
│   │   ├── security.py           # Security utilities
│   │   └── cache.py              # Redis caching
│   ├── services/
│   │   ├── __init__.py
│   │   ├── document_processor.py # Document processing service
│   │   ├── rag_service.py        # RAG retrieval service
│   │   ├── index_service.py      # Index management service
│   │   └── openai_service.py     # OpenAI integration
│   ├── utils/
│   │   ├── __init__.py
│   │   ├── file_utils.py         # File handling utilities
│   │   └── llama_utils.py        # LlamaIndex utilities
│   └── middleware/
│       ├── __init__.py
│       ├── cors.py               # CORS middleware
│       └── logging.py            # Request logging
├── requirements.txt
├── Dockerfile
├── docker-compose.yml
└── .env.example

Frontend (React)

frontend/
├── public/
│   ├── index.html
│   └── favicon.ico
├── src/
│   ├── components/
│   │   ├── common/
│   │   │   ├── Header.jsx
│   │   │   ├── Sidebar.jsx
│   │   │   ├── Layout.jsx
│   │   │   └── LoadingSpinner.jsx
│   │   ├── auth/
│   │   │   ├── LoginForm.jsx
│   │   │   └── ProtectedRoute.jsx
│   │   ├── documents/
│   │   │   ├── DocumentUpload.jsx
│   │   │   ├── DocumentList.jsx
│   │   │   └── DocumentViewer.jsx
│   │   ├── chat/
│   │   │   ├── ChatInterface.jsx
│   │   │   ├── MessageList.jsx
│   │   │   └── MessageInput.jsx
│   │   ├── indices/
│   │   │   ├── IndexList.jsx
│   │   │   ├── IndexManager.jsx
│   │   │   └── CreateIndex.jsx
│   │   └── admin/
│   │       ├── UserManagement.jsx
│   │       └── SystemMonitor.jsx
│   ├── hooks/
│   │   ├── useAuth.js
│   │   ├── useDocuments.js
│   │   ├── useChat.js
│   │   └── useIndices.js
│   ├── services/
│   │   ├── api.js                # Axios configuration
│   │   ├── authService.js        # Authentication API calls
│   │   ├── documentService.js    # Document API calls
│   │   ├── chatService.js        # Chat API calls
│   │   └── indexService.js       # Index API calls
│   ├── context/
│   │   ├── AuthContext.js
│   │   └── ThemeContext.js
│   ├── utils/
│   │   ├── constants.js
│   │   ├── helpers.js
│   │   └── validation.js
│   ├── styles/
│   │   ├── globals.css
│   │   └── components/
│   ├── App.jsx
│   ├── index.js
│   └── routes.js
├── package.json
├── tailwind.config.js
├── vite.config.js
└── .env.example

Technology Stack

Backend

  • FastAPI: Modern, fast web framework for Python APIs
  • MongoDB: Document database for user data, metadata
  • ChromaDB: Vector database for document embeddings (kept from current)
  • Redis: Caching layer for improved performance
  • Pydantic: Data validation and serialization
  • JWT: Token-based authentication
  • LlamaIndex: RAG framework (kept from current)
  • OpenAI: GPT-4 for analysis and embeddings

Frontend

  • React 18: Modern React with hooks
  • Vite: Fast build tool and dev server
  • Tailwind CSS: Utility-first CSS framework
  • Axios: HTTP client for API calls
  • React Router: Client-side routing
  • React Hook Form: Form handling
  • Zustand: State management
  • React Query: Server state management

Migration Strategy

Phase 1: Backend Foundation

  1. Set up FastAPI project structure
  2. Configure MongoDB connection
  3. Implement user authentication with JWT
  4. Create data models and schemas
  5. Set up Redis caching

Phase 2: Core Services

  1. Port document processing pipeline
  2. Implement RAG service with LlamaIndex
  3. Create OpenAI integration service
  4. Implement index management
  5. Set up file upload handling

Phase 3: API Endpoints

  1. Authentication endpoints
  2. Document management endpoints
  3. Chat/query endpoints
  4. Index management endpoints
  5. Admin endpoints

Phase 4: Frontend Development

  1. Set up React project with Vite
  2. Create authentication flow
  3. Build document management interface
  4. Implement chat interface
  5. Create admin dashboard

Phase 5: Integration & Testing

  1. Connect frontend to backend APIs
  2. Implement proper error handling
  3. Add loading states and UX improvements
  4. Performance optimization
  5. Security hardening

Phase 6: Deployment

  1. Docker containerization
  2. Environment configuration
  3. Production deployment setup
  4. Monitoring and logging

Data Migration

User Data

  • Migrate from SQLite to MongoDB
  • Transform user authentication to JWT
  • Preserve user roles and permissions

Document Indices

  • Keep existing ChromaDB indices
  • Update index metadata in MongoDB
  • Maintain document access permissions

Configuration

  • Environment variables migration
  • API key management
  • Cache configuration

Key Improvements

Performance

  • Async/await throughout backend
  • Redis caching for API responses
  • Optimized database queries
  • React Query for client-side caching

Security

  • JWT-based authentication
  • Input validation with Pydantic
  • CORS configuration
  • Rate limiting

Scalability

  • Microservice-ready architecture
  • Database connection pooling
  • Horizontal scaling support
  • Load balancing ready

Developer Experience

  • Type hints throughout Python code
  • API documentation with FastAPI
  • Modern React patterns
  • Hot reloading in development

User Experience

  • Modern, responsive UI
  • Real-time updates
  • Better error handling
  • Improved performance

Implementation Timeline

  1. Week 1: Backend foundation and authentication
  2. Week 2: Core services and API endpoints
  3. Week 3: Frontend setup and basic components
  4. Week 4: Integration and testing
  5. Week 5: Deployment and optimization

File Deletion Strategy

Files will be deleted progressively as new implementations are completed:

  1. Phase 1: Remove PHP authentication files after JWT implementation
  2. Phase 2: Remove PHP API files after FastAPI endpoints
  3. Phase 3: Remove Python processing scripts after service implementation
  4. Phase 4: Remove remaining PHP files after frontend completion
  5. Phase 5: Clean up temporary files and documentation

Environment Configuration

Backend (.env)

# Database
MONGODB_URL=mongodb://localhost:27017
DATABASE_NAME=contract_analysis

# Redis
REDIS_URL=redis://localhost:6379

# Authentication
JWT_SECRET_KEY=your-secret-key
JWT_ALGORITHM=HS256
JWT_EXPIRE_MINUTES=30

# OpenAI
OPENAI_API_KEY=your-openai-key
LLAMAPARSE_API_KEY=your-llamaparse-key

# Application
DEBUG=false
CORS_ORIGINS=["http://localhost:3000"]

Frontend (.env)

VITE_API_URL=http://localhost:8000
VITE_APP_NAME=Contract Analysis Tool

Success Criteria

  • Complete feature parity with current application
  • Improved performance (faster load times, better caching)
  • Modern, responsive UI
  • Scalable architecture
  • Comprehensive API documentation
  • Security improvements
  • Easy deployment and maintenance

This migration will modernize the application while maintaining all existing functionality and improving performance, security, and maintainability.