Infrastructure Setup:
- Initialized Git repository
- Created project directory structure (backend, frontend, infrastructure)
- Configured Docker Compose with 4 services:
* PostgreSQL 16 (database)
* Redis 7 (cache & task queue)
* Qdrant (vector database)
* Backend (placeholder)
Configuration:
- Created comprehensive .env.example with all required environment variables
- Added .gitignore for Python and Node.js
- Created backend Dockerfile (placeholder for Phase 2)
- Added healthchecks for all services
Documentation:
- Created README.md with quick start guide and project overview
- Documented implementation plan (6 phases)
- Included concept and technical specifications
Verification:
- docker-compose build: ✅ Successful
- All services started and passed health checks:
* PostgreSQL: Accepting connections on port 5432
* Redis: Responding to ping on port 6379
* Qdrant: API healthy on port 6333
Next Phase: Backend Core (FastAPI, Entra ID Auth, RBAC, Alembic migrations)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
9.5 KiB
Perfect! I've completed the architectural interview. Here's the comprehensive specification based on your answers:
🎯 Enterprise AI Hub "Nexus" - Final Architecture Specification
- Core Technology Stack (Confirmed) Frontend: Next.js 14 (App Router), TypeScript, Shadcn/UI, Tailwind CSS, Zustand, TanStack Query Backend: FastAPI (Python 3.11+), LangChain/LangGraph Task Queue: Celery + Redis (exponential backoff retries, 3 attempts max) Auth: Microsoft Entra ID (OAuth2/OIDC) → Backend JWT (15min) + Refresh Tokens (7d) Databases: PostgreSQL (relational data) Qdrant (single collection sharepoint_docs, metadata-filtered) Redis (sessions, cache, task queue) Embeddings: OpenAI text-embedding-3-large (3072 dim)
- Database Schema Design Users & RBAC
users ├─ id (uuid, PK) ├─ entra_id (string, unique) -- Azure AD object ID ├─ email (string, unique) ├─ display_name (string) ├─ role (enum: 'super_admin', 'content_manager', 'user') ├─ department_id (FK → departments.id, nullable) ├─ country_id (FK → countries.id, nullable) ├─ created_at (timestamp) └─ last_login_at (timestamp)
departments ├─ id (uuid, PK) ├─ name (string, unique) └─ created_at (timestamp)
countries ├─ id (uuid, PK) ├─ code (string, unique) -- ISO 3166-1 alpha-2 ├─ name (string) └─ created_at (timestamp) Conversations & Messages
conversations ├─ id (uuid, PK) ├─ user_id (FK → users.id) ├─ mode (enum: 'rag', 'assistant', 'notebook') ├─ title (string, nullable) -- Auto-generated or user-set ├─ created_at (timestamp) └─ updated_at (timestamp)
messages ├─ id (uuid, PK) ├─ conversation_id (FK → conversations.id, ON DELETE CASCADE) ├─ role (enum: 'user', 'assistant', 'system') ├─ content (text) ├─ input_tokens (int, nullable) ├─ output_tokens (int, nullable) ├─ llm_provider (string, nullable) -- 'openai', 'azure', 'anthropic' ├─ llm_model (string, nullable) -- 'gpt-4-turbo', 'claude-sonnet-4.5' ├─ created_at (timestamp) └─ index: (conversation_id, created_at) Feedback System
feedback ├─ id (uuid, PK) ├─ message_id (FK → messages.id) ├─ conversation_id (FK → conversations.id) ├─ user_id (FK → users.id) ├─ rating (enum: 'positive', 'negative') ├─ user_comment (text, nullable) ├─ retrieved_sources (jsonb, nullable) -- [{doc_id, title, url, score}] ├─ status (enum: 'pending', 'reviewed', 'resolved') ├─ reviewed_by (FK → users.id, nullable) ├─ reviewed_at (timestamp, nullable) └─ created_at (timestamp) SharePoint Sync State
sharepoint_sync_state ├─ id (uuid, PK) ├─ library_url (string, unique) -- Full SharePoint library URL ├─ delta_token (string, nullable) -- MS Graph delta link ├─ last_sync_at (timestamp) ├─ status (enum: 'idle', 'syncing', 'error') └─ error_message (text, nullable)
sharepoint_documents ├─ id (uuid, PK) ├─ sharepoint_id (string, unique) -- MS Graph item ID ├─ file_name (string) ├─ file_url (string) -- Deep link to SharePoint ├─ library_url (string) ├─ department_id (FK → departments.id, nullable) ├─ country_id (FK → countries.id, nullable) ├─ file_type (string) -- pdf, docx, xlsx ├─ file_size (bigint) -- bytes ├─ last_modified (timestamp) -- From SharePoint ├─ qdrant_point_ids (jsonb) -- Array of vector IDs in Qdrant ├─ ingested_at (timestamp) ├─ is_active (boolean, default true) └─ index: (sharepoint_id), (is_active) Notebook Mode (NotebookLlama)
notebook_sessions ├─ id (uuid, PK) ├─ user_id (FK → users.id) ├─ title (string, nullable) ├─ conversation_id (FK → conversations.id) ├─ is_pinned (boolean, default false) ├─ expires_at (timestamp) -- 24h default, null if pinned ├─ created_at (timestamp) └─ index: (user_id, expires_at)
uploaded_files ├─ id (uuid, PK) ├─ session_id (FK → notebook_sessions.id, ON DELETE CASCADE) ├─ file_name (string) ├─ file_size (bigint) ├─ file_type (string) ├─ storage_path (string) -- Local filesystem or S3 ├─ notebookllama_file_id (string, nullable) └─ uploaded_at (timestamp) Admin Configuration
llm_providers ├─ id (uuid, PK) ├─ mode (enum: 'rag', 'assistant', 'notebook') ├─ provider (enum: 'openai', 'azure', 'anthropic') ├─ model_name (string) -- 'gpt-4-turbo', 'claude-sonnet-4.5' ├─ api_key_env_var (string) -- Reference to env var name ├─ is_active (boolean, default true) └─ updated_at (timestamp)
system_prompts ├─ id (uuid, PK) ├─ mode (enum: 'rag', 'assistant', 'notebook') ├─ prompt_text (text) ├─ is_active (boolean, default true) ├─ created_by (FK → users.id) └─ created_at (timestamp) 3. Qdrant Vector Store Structure Collection: sharepoint_docs
Vector Dimension: 3072 (text-embedding-3-large) Distance Metric: Cosine Payload Schema:
{ "sharepoint_id": "unique_doc_id", "file_name": "Policy_2024.pdf", "file_url": "https://sharepoint.com/...", "chunk_index": 0, "total_chunks": 12, "text": "Full chunk text...", "department_id": "uuid", "country_id": "uuid", "file_type": "pdf", "last_modified": "2024-11-15T10:00:00Z", "is_active": true } Indexes: Create payload indexes on department_id, country_id, is_active, file_type 4. Integration Patterns SharePoint Ingestion Pipeline
Celery Scheduled Task (hourly) ├─> Fetch MS Graph Delta API ├─> Identify: New, Modified, Deleted files ├─> For each changed document: │ ├─> Download file content │ ├─> Extract text (PyPDF2, python-docx, etc.) │ ├─> Chunk text (LangChain RecursiveCharacterTextSplitter) │ ├─> Generate embeddings (OpenAI API) │ ├─> Upsert to Qdrant with metadata │ └─> Update sharepoint_documents table └─> Update sharepoint_sync_state with new delta_token NotebookLlama Proxy Flow
Frontend Upload Request ├─> POST /api/notebook/sessions/{id}/upload ├─> FastAPI validates file (size, type) ├─> Save to local storage /tmp/uploads/{session_id}/ ├─> Forward to NotebookLlama POST /upload ├─> Store uploaded_files record └─> Return file_id to frontend
Chat Query ├─> POST /api/notebook/sessions/{id}/chat (SSE stream) ├─> Fetch session uploaded_files from PostgreSQL ├─> Forward to NotebookLlama POST /chat with context └─> Stream response token-by-token to frontend RAG Citation Format
User Query: "What is the vacation policy?"
LLM Response: "According to company policy [1], employees receive 20 days of annual leave [2]. This applies to all full-time staff [1].
Sources: [1] HR_Policy_2024.pdf - https://sharepoint.com/sites/hr/HR_Policy_2024.pdf [2] Benefits_Guide.docx - https://sharepoint.com/sites/hr/Benefits_Guide.docx" 5. Authentication Flow
- User → Frontend: Click "Login with Microsoft"
- Frontend → Entra ID: Redirect to OAuth consent page
- Entra ID → Frontend: Return authorization code
- Frontend → Backend: POST /api/auth/login { code }
- Backend:
a. Exchange code for Entra ID access token
b. Fetch user profile (email, name, Entra object ID)
c. Check if user exists in PostgreSQL
- If new: Auto-create user with role='user'
- If existing: Update last_login_at d. Generate Backend JWT (exp: 15min) e. Generate Refresh Token (exp: 7 days), store in Redis f. Return { access_token, refresh_token, user }
- Frontend: Store tokens in memory (access) + httpOnly cookie (refresh)
- All API calls: Send access_token in Authorization: Bearer header
- Token refresh: POST /api/auth/refresh { refresh_token }
- API Streaming (Server-Sent Events) Endpoint: GET /api/chat/stream?conversation_id=xxx
FastAPI
async def stream_chat(conversation_id: str): async def event_generator(): async for token in llm_client.stream(messages): yield f"data: {json.dumps({'token': token})}\n\n" yield f"data: {json.dumps({'done': true})}\n\n"
return StreamingResponse(
event_generator(),
media_type="text/event-stream"
)
// Frontend
const eventSource = new EventSource(/api/chat/stream?conversation_id=${id});
eventSource.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.done) {
eventSource.close();
} else {
appendToken(data.token);
}
};
7. Security Checklist
✅ No user passwords (Entra ID SSO only)
✅ Backend validates all JWT signatures
✅ Refresh tokens stored in Redis with TTL
✅ API keys in environment variables, never in database
✅ CORS configured to frontend domain only
✅ Rate limiting deferred to Phase 2
✅ SQL injection protection via ORM (SQLAlchemy parameterized queries)
✅ File upload validation (type, size limits)
✅ Celery task retry limits to prevent runaway jobs
8. Deployment Readiness
Environment Variables: .env.example template with all required keys
Database Migrations: Alembic for PostgreSQL schema versioning
Celery Workers: Separate processes for task execution
Redis: Single instance for sessions + task queue (consider Redis Cluster for scale)
Qdrant: Docker deployment or Qdrant Cloud
Secrets: .env for local dev, environment variables in production
Next Steps
You're now ready to proceed with implementation. The architecture is:
Scalable: Celery for async tasks, Redis caching, Qdrant vector search Secure: Entra ID SSO, JWT auth, no stored credentials Observable: Per-message token tracking, feedback system, admin dashboards Maintainable: Clear separation of concerns, dynamic configuration, auto-retry logic Would you like me to start creating the database migration scripts, API scaffolding, or any other specific component?