contract-query/MIGRATION_PLAN.md
2025-08-14 15:03:33 -05:00

326 lines
No EOL
10 KiB
Markdown

# Migration Plan: PHP/Python → FastAPI/React
## Overview
This document outlines the complete migration strategy for transforming the current PHP/Python hybrid RAG application into a modern FastAPI backend with React frontend architecture.
## Current Application Analysis
### Existing Features
- **User Authentication**: Role-based access (admin/user) with SQLite storage
- **Document Management**: File uploads, processing, and indexing
- **RAG System**: LlamaIndex + ChromaDB for document retrieval
- **Contract Analysis**: GPT-4 powered contract field extraction
- **Chat Interface**: Natural language document Q&A
- **Caching System**: Response caching for performance
- **Index Management**: User-specific access control to document indices
### Current Architecture
```
PHP Frontend (Web UI) → Python Backend (Processing) → OpenAI API
↓ ↓
SQLite DB ChromaDB Vectors
```
## New Architecture
### Target Architecture
```
React Frontend → FastAPI Backend → MongoDB + ChromaDB → OpenAI API
Redis Cache
```
## Project Structure
### Backend (FastAPI)
```
backend/
├── app/
│ ├── __init__.py
│ ├── main.py # FastAPI app entry point
│ ├── config/
│ │ ├── __init__.py
│ │ ├── settings.py # Environment configuration
│ │ └── database.py # MongoDB connection
│ ├── models/
│ │ ├── __init__.py
│ │ ├── user.py # User data models
│ │ ├── document.py # Document models
│ │ ├── index.py # Index models
│ │ └── chat.py # Chat/query models
│ ├── schemas/
│ │ ├── __init__.py
│ │ ├── user.py # Pydantic schemas
│ │ ├── document.py
│ │ ├── index.py
│ │ └── chat.py
│ ├── api/
│ │ ├── __init__.py
│ │ ├── deps.py # Dependencies
│ │ └── v1/
│ │ ├── __init__.py
│ │ ├── auth.py # Authentication endpoints
│ │ ├── documents.py # Document management
│ │ ├── indices.py # Index management
│ │ ├── chat.py # Chat/query endpoints
│ │ └── admin.py # Admin endpoints
│ ├── core/
│ │ ├── __init__.py
│ │ ├── auth.py # JWT authentication
│ │ ├── security.py # Security utilities
│ │ └── cache.py # Redis caching
│ ├── services/
│ │ ├── __init__.py
│ │ ├── document_processor.py # Document processing service
│ │ ├── rag_service.py # RAG retrieval service
│ │ ├── index_service.py # Index management service
│ │ └── openai_service.py # OpenAI integration
│ ├── utils/
│ │ ├── __init__.py
│ │ ├── file_utils.py # File handling utilities
│ │ └── llama_utils.py # LlamaIndex utilities
│ └── middleware/
│ ├── __init__.py
│ ├── cors.py # CORS middleware
│ └── logging.py # Request logging
├── requirements.txt
├── Dockerfile
├── docker-compose.yml
└── .env.example
```
### Frontend (React)
```
frontend/
├── public/
│ ├── index.html
│ └── favicon.ico
├── src/
│ ├── components/
│ │ ├── common/
│ │ │ ├── Header.jsx
│ │ │ ├── Sidebar.jsx
│ │ │ ├── Layout.jsx
│ │ │ └── LoadingSpinner.jsx
│ │ ├── auth/
│ │ │ ├── LoginForm.jsx
│ │ │ └── ProtectedRoute.jsx
│ │ ├── documents/
│ │ │ ├── DocumentUpload.jsx
│ │ │ ├── DocumentList.jsx
│ │ │ └── DocumentViewer.jsx
│ │ ├── chat/
│ │ │ ├── ChatInterface.jsx
│ │ │ ├── MessageList.jsx
│ │ │ └── MessageInput.jsx
│ │ ├── indices/
│ │ │ ├── IndexList.jsx
│ │ │ ├── IndexManager.jsx
│ │ │ └── CreateIndex.jsx
│ │ └── admin/
│ │ ├── UserManagement.jsx
│ │ └── SystemMonitor.jsx
│ ├── hooks/
│ │ ├── useAuth.js
│ │ ├── useDocuments.js
│ │ ├── useChat.js
│ │ └── useIndices.js
│ ├── services/
│ │ ├── api.js # Axios configuration
│ │ ├── authService.js # Authentication API calls
│ │ ├── documentService.js # Document API calls
│ │ ├── chatService.js # Chat API calls
│ │ └── indexService.js # Index API calls
│ ├── context/
│ │ ├── AuthContext.js
│ │ └── ThemeContext.js
│ ├── utils/
│ │ ├── constants.js
│ │ ├── helpers.js
│ │ └── validation.js
│ ├── styles/
│ │ ├── globals.css
│ │ └── components/
│ ├── App.jsx
│ ├── index.js
│ └── routes.js
├── package.json
├── tailwind.config.js
├── vite.config.js
└── .env.example
```
## Technology Stack
### Backend
- **FastAPI**: Modern, fast web framework for Python APIs
- **MongoDB**: Document database for user data, metadata
- **ChromaDB**: Vector database for document embeddings (kept from current)
- **Redis**: Caching layer for improved performance
- **Pydantic**: Data validation and serialization
- **JWT**: Token-based authentication
- **LlamaIndex**: RAG framework (kept from current)
- **OpenAI**: GPT-4 for analysis and embeddings
### Frontend
- **React 18**: Modern React with hooks
- **Vite**: Fast build tool and dev server
- **Tailwind CSS**: Utility-first CSS framework
- **Axios**: HTTP client for API calls
- **React Router**: Client-side routing
- **React Hook Form**: Form handling
- **Zustand**: State management
- **React Query**: Server state management
## Migration Strategy
### Phase 1: Backend Foundation
1. Set up FastAPI project structure
2. Configure MongoDB connection
3. Implement user authentication with JWT
4. Create data models and schemas
5. Set up Redis caching
### Phase 2: Core Services
1. Port document processing pipeline
2. Implement RAG service with LlamaIndex
3. Create OpenAI integration service
4. Implement index management
5. Set up file upload handling
### Phase 3: API Endpoints
1. Authentication endpoints
2. Document management endpoints
3. Chat/query endpoints
4. Index management endpoints
5. Admin endpoints
### Phase 4: Frontend Development
1. Set up React project with Vite
2. Create authentication flow
3. Build document management interface
4. Implement chat interface
5. Create admin dashboard
### Phase 5: Integration & Testing
1. Connect frontend to backend APIs
2. Implement proper error handling
3. Add loading states and UX improvements
4. Performance optimization
5. Security hardening
### Phase 6: Deployment
1. Docker containerization
2. Environment configuration
3. Production deployment setup
4. Monitoring and logging
## Data Migration
### User Data
- Migrate from SQLite to MongoDB
- Transform user authentication to JWT
- Preserve user roles and permissions
### Document Indices
- Keep existing ChromaDB indices
- Update index metadata in MongoDB
- Maintain document access permissions
### Configuration
- Environment variables migration
- API key management
- Cache configuration
## Key Improvements
### Performance
- Async/await throughout backend
- Redis caching for API responses
- Optimized database queries
- React Query for client-side caching
### Security
- JWT-based authentication
- Input validation with Pydantic
- CORS configuration
- Rate limiting
### Scalability
- Microservice-ready architecture
- Database connection pooling
- Horizontal scaling support
- Load balancing ready
### Developer Experience
- Type hints throughout Python code
- API documentation with FastAPI
- Modern React patterns
- Hot reloading in development
### User Experience
- Modern, responsive UI
- Real-time updates
- Better error handling
- Improved performance
## Implementation Timeline
1. **Week 1**: Backend foundation and authentication
2. **Week 2**: Core services and API endpoints
3. **Week 3**: Frontend setup and basic components
4. **Week 4**: Integration and testing
5. **Week 5**: Deployment and optimization
## File Deletion Strategy
Files will be deleted progressively as new implementations are completed:
1. **Phase 1**: Remove PHP authentication files after JWT implementation
2. **Phase 2**: Remove PHP API files after FastAPI endpoints
3. **Phase 3**: Remove Python processing scripts after service implementation
4. **Phase 4**: Remove remaining PHP files after frontend completion
5. **Phase 5**: Clean up temporary files and documentation
## Environment Configuration
### Backend (.env)
```
# Database
MONGODB_URL=mongodb://localhost:27017
DATABASE_NAME=contract_analysis
# Redis
REDIS_URL=redis://localhost:6379
# Authentication
JWT_SECRET_KEY=your-secret-key
JWT_ALGORITHM=HS256
JWT_EXPIRE_MINUTES=30
# OpenAI
OPENAI_API_KEY=your-openai-key
LLAMAPARSE_API_KEY=your-llamaparse-key
# Application
DEBUG=false
CORS_ORIGINS=["http://localhost:3000"]
```
### Frontend (.env)
```
VITE_API_URL=http://localhost:8000
VITE_APP_NAME=Contract Analysis Tool
```
## Success Criteria
- [ ] Complete feature parity with current application
- [ ] Improved performance (faster load times, better caching)
- [ ] Modern, responsive UI
- [ ] Scalable architecture
- [ ] Comprehensive API documentation
- [ ] Security improvements
- [ ] Easy deployment and maintenance
This migration will modernize the application while maintaining all existing functionality and improving performance, security, and maintainability.