- README.md: full rewrite reflecting current production state — PKCE auth, RAG pipeline, Cloud Run, deployment, env vars, API overview, troubleshooting - docs/01_Enterprise_AI_Hub_Nexus_User_Guide.pdf - docs/02_Enterprise_AI_Hub_Nexus_Technical_Documentation.pdf Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
18 KiB
Enterprise AI Hub Nexus
Secure AI platform for knowledge management, RAG chat, and Microsoft 365 productivity — built for OLIVER Agency.
Table of Contents
- Overview
- Features
- Architecture
- Technology Stack
- Repository Structure
- RAG Pipeline
- Document Processing
- Authentication
- Local Development
- Production Deployment
- Environment Variables
- Database Migrations
- API Overview
- Troubleshooting
Overview
Nexus is an enterprise AI platform that provides:
- RAG Chat — natural language questions answered from the company knowledge base, with source citations
- Personal Assistant — read emails, calendar, OneDrive files and SharePoint content via Microsoft Graph (read-only)
- Knowledge Base Management — admin panel to upload, manage and re-index documents
- Multi-language support — ask in any language; responses are in the same language
- Department & Region scoping — content filtered per team and location
The platform runs in production on a GCE VM (optical-web-1) with Apache as a reverse proxy, Docker Compose for backend services, and a Google Cloud Run microservice for heavy document processing.
Features
Authentication
- Microsoft Entra ID (Azure AD) — PKCE SPA flow, no client_secret
- JWT tokens (HS256, 8-hour lifetime)
- Role-based access:
super_admin,content_manager,user - Microsoft 365 consent flow for Personal Assistant mode
RAG Chat
- Multi-query expansion — 3 search variants (translated + UK English + US English terminology)
- Parallel Qdrant vector search across all variants
- LLM reranking (Claude Haiku, 0–10 score) from up to 60 candidates → top 5
- Contextual chunk embedding (
[Document Title]\n\nchunk text) - Document summary vectors (AI-generated 3–4 sentence summary per document)
- Source citations with SharePoint links
- SSE streaming responses
Personal Assistant (M365)
- Read emails, summarise threads
- Read calendar events
- List and search OneDrive files
- Search SharePoint document libraries
- Agentic tool-calling loop (parallel tool execution, up to 5 rounds)
Knowledge Base
- Upload: PDF, DOCX, DOC, XLSX, XLS, PPTX, PPT, TXT, CSV
- Web page scraping (URL → index)
- SharePoint document library browser and import
- SHA-256 deduplication (identical files skipped)
- 4-concurrent upload queue
- Per-document reprocess button (failed or 0-chunk documents)
- Re-index All — re-embeds all documents with current pipeline
- Stop Re-index — cancels pending reindex tasks
- Bulk delete with checkboxes
- Stats: total docs, completed, failed, vectors in Qdrant
- Sortable table, limit 1000 documents
Admin Panel
- User management (invite, role, department, region)
- Department management
- LLM provider API key configuration
- Analytics dashboard
- SharePoint source configuration
Architecture
User Browser (Next.js SPA)
│ HTTPS / SSE
▼
Apache 2.4 (reverse proxy + static files)
│
├─── /nexus/* ──→ /var/www/html/enterprise-ai-hub-nexus/ (static export)
│
└─── /api/v1/* ──→ FastAPI :8000 (Docker)
│
┌───────────┼───────────┐
▼ ▼ ▼
Qdrant PostgreSQL Redis
:6333 :5432 :6379
(vectors) (metadata) (Celery)
│
└── Cloud Run: Doc Processor (extract + chunk)
└── Azure AD / MS Graph (auth + M365 tools)
GCE VM optical-web-1 hosts: Apache, FastAPI, Qdrant, PostgreSQL, Redis, Celery worker/beat — all in Docker Compose.
Google Cloud Run (nexus-processor, europe-west1) handles CPU-intensive document extraction and chunking, called via HTTPS from the backend VM.
Technology Stack
| Layer | Technology | Details |
|---|---|---|
| Frontend | Next.js 14 | App Router, static export, basePath: /nexus |
| Frontend | React 18 + TypeScript 5 | |
| Frontend | Tailwind CSS + shadcn/ui | |
| Frontend | Zustand | useAuthStore, useChatStore |
| Backend | FastAPI 0.115+ | Python 3.11+ |
| Backend | SQLAlchemy 2.x (async) | asyncpg driver |
| Backend | Alembic | 14 migrations |
| Backend | Celery + Redis | Background tasks and scheduler |
| AI — RAG | OpenAI GPT-5 (gpt-5.2) |
Streaming answers |
| AI — Assistant | Anthropic Claude Sonnet (claude-sonnet-4-6) |
Agentic tool loop |
| AI — Reranking | Anthropic Claude Haiku (claude-haiku-4-5) |
Rerank, summaries, query expansion |
| AI — Summary | Google Gemini (gemini-3.1-pro-preview) |
Summarisation, planning |
| AI — Embeddings | OpenAI text-embedding-3-large |
3072 dimensions |
| Vector DB | Qdrant 1.12.x | Self-hosted on GCE VM |
| Relational DB | PostgreSQL 15 | |
| Cloud | Google Cloud Run | Document processor microservice |
| Infrastructure | GCE VM n2d-standard-4 | Backend + all services |
| Auth | Azure AD / Entra ID | PKCE SPA flow |
| Auth | Microsoft Graph API v1.0 | User profile + M365 tools |
| Web server | Apache 2.4 | Reverse proxy + static files |
| Containers | Docker Compose | docker-compose.prod.yml |
Repository Structure
enterprise-ai-hub-nexus/
│
├── backend/
│ ├── app/
│ │ ├── api/v1/endpoints/
│ │ │ ├── auth.py # PKCE login → JWT
│ │ │ ├── chat.py # SSE streaming RAG + assistant
│ │ │ ├── knowledge.py # Document upload, list, delete, reindex
│ │ │ ├── users.py # User CRUD (super_admin)
│ │ │ ├── departments.py # Department management
│ │ │ └── config.py # LLM key configuration
│ │ ├── core/
│ │ │ ├── document_processor.py # Extract, chunk, embed, upsert to Qdrant
│ │ │ ├── llm.py # LLMFactory — multi-provider, streaming, tool loop
│ │ │ ├── cloud_run_client.py # HTTP client for Cloud Run processor
│ │ │ └── web_scraper.py # URL → text via trafilatura
│ │ ├── rag/
│ │ │ └── retriever.py # Multi-query expansion, parallel search, LLM rerank
│ │ ├── tools/ # Personal assistant tools (email, calendar, files)
│ │ ├── models/ # SQLAlchemy ORM models
│ │ ├── schemas/ # Pydantic request/response schemas
│ │ ├── config.py # pydantic-settings (env vars)
│ │ └── database.py # Async engine + AsyncSessionLocal
│ ├── alembic/versions/ # 14 migration files
│ ├── cloud_run_processor/ # Cloud Run microservice (extract + chunk only)
│ ├── Dockerfile
│ └── requirements.txt
│
├── frontend/
│ ├── app/
│ │ ├── admin/page.tsx # Admin dashboard
│ │ ├── auth/callback/page.tsx # OAuth callback handler
│ │ └── chat/page.tsx # Main chat UI
│ ├── components/
│ │ ├── admin/ # KnowledgeUploader, SharePointBrowser, UsersTab, etc.
│ │ ├── auth/protected-route.tsx # Auth guard with hydration tracking
│ │ └── chat/chat-interface.tsx # SSE stream consumer, citations
│ ├── lib/
│ │ ├── api-client.ts # Typed API client with JWT auto-attach
│ │ └── microsoft-oauth.ts # PKCE flow + MS token exchange
│ ├── store/ # useAuthStore, useChatStore (Zustand)
│ └── types/ # TypeScript types
│
├── docs/
│ ├── 01_Enterprise_AI_Hub_Nexus_User_Guide.pdf
│ ├── 02_Enterprise_AI_Hub_Nexus_Technical_Documentation.pdf
│ └── OLIVER_BRAND_ADAPTATION.md
│
├── docker-compose.prod.yml # Production services
├── docker-compose.local.yml # Local dev (db :5433, redis :6380, backend :1222)
└── deploy.sh # Full deploy script
RAG Pipeline
The retrieval pipeline lives in backend/app/rag/retriever.py.
User Query
│
▼
Query Expansion (Claude Haiku)
→ Variant 1: normalised / translated
→ Variant 2: UK English (annual leave, holiday, redundancy...)
→ Variant 3: US English (vacation, PTO, layoff...)
│
▼
Parallel Embed (text-embedding-3-large, asyncio.gather)
│
▼
Parallel Qdrant Search (top_k=10 per variant, filters: is_active, department, region)
│
▼
Merge + Dedup by point ID (highest score kept) → up to 60 candidates
│
▼
LLM Reranking (Claude Haiku, score 0–10 per chunk) → top 5
│
▼
LLM Answer (GPT-5, streaming SSE) + source citations
Key design decisions:
- Multi-query expansion bridges UK/US terminology differences in HR documents
- Reranking replaces binary yes/no grading with continuous relevance scores
- Each chunk stored with original text; contextualised version used only for embedding
- Document summary vectors improve topic-level discovery
Document Processing
backend/app/core/document_processor.py — two-phase design:
Phase 1 — extract_and_chunk (runs on Cloud Run, no Qdrant/OpenAI needed):
- PDF (text-based): MarkItDown
- PDF (scanned): LlamaParse (cloud OCR), falls back to MarkItDown
- DOCX, XLSX, PPTX: MarkItDown
- TXT/CSV: direct UTF-8 decode
- Chunk size: 1000 chars, overlap: 200 chars
Phase 2 — embed_and_upsert (runs on backend VM):
- Delete existing vectors for document (re-ingestion safe)
- Contextualise chunks:
[Document Title]\n\nchunk text - Embed in parallel batches of 100 (asyncio.gather)
- Generate document summary (Claude Haiku, 3–4 sentences)
- Upsert all points + summary vector to Qdrant
- Update DB status: COMPLETED / FAILED
Background processing — all document operations use FastAPI BackgroundTasks with an independent DB session to avoid StaleDataError from long-lived HTTP sessions.
Authentication
PKCE OAuth 2.0 flow — no client_secret:
- Browser generates
code_verifier+code_challenge(S256) - Redirect to Azure AD with
code_challenge - Azure AD returns
auth_codeto/auth/callback - Browser exchanges
auth_code + code_verifier→ MSaccess_token(direct to Microsoft) - Browser POSTs
ms_access_tokentoPOST /api/v1/auth/login - Backend calls MS Graph
/meto validate token and get user profile - Backend returns signed app JWT
- All subsequent API calls use
Authorization: Bearer <JWT>
For Personal Assistant mode, the MS access_token is passed in the request body as graph_token and used for Graph API calls on-demand.
Local Development
Prerequisites
- Docker & Docker Compose
- Node.js 18+
- Python 3.11+ (optional, for running backend outside Docker)
1. Clone
git clone git@bitbucket.org:zlalani/enterprise-ai-hub-nexus.git
cd enterprise-ai-hub-nexus
2. Configure environment
cp backend/.env.example backend/.env
# Edit backend/.env with real API keys (see Environment Variables section)
3. Start backend services
docker-compose -f docker-compose.local.yml up -d
# Backend: http://localhost:1222
# PostgreSQL: localhost:5433
# Redis: localhost:6380
# Qdrant: http://localhost:6333
4. Apply migrations
docker exec backend alembic upgrade head
5. Start frontend
cd frontend
npm install
npm run dev
# http://localhost:3000/nexus
6. API docs
http://localhost:1222/docs
Production Deployment
Production runs on GCE VM optical-web-1. Deploy with:
# On the server
cd /opt/enterprise-ai-hub-nexus
./deploy.sh
deploy.sh performs:
git pull origin maincd frontend && npm ci && npm run buildrsync out/ /var/www/html/enterprise-ai-hub-nexus/docker-compose -f docker-compose.prod.yml up -d --build backend celery-workerdocker exec backend alembic upgrade head
Docker Compose services (prod)
| Service | Image | Port | Description |
|---|---|---|---|
| backend | ./Dockerfile | 8000 | FastAPI + uvicorn |
| celery-worker | ./Dockerfile | — | Celery worker (healthcheck: inspect ping) |
| celery-beat | ./Dockerfile | — | Celery scheduler |
| redis | redis:7-alpine | 6379 | Broker + cache |
| qdrant | qdrant/qdrant:v1.12.1 | 6333, 6334 | Vector DB |
Cloud Run Processor
Document processor deployed separately:
cd backend/cloud_run_processor
gcloud run deploy nexus-processor \
--region europe-west1 \
--timeout 900 \
--memory 2Gi \
--no-allow-unauthenticated
URL: https://nexus-processor-818629422283.europe-west1.run.app
The backend VM service account needs roles/run.invoker on this service.
Environment Variables
| Variable | Required | Description |
|---|---|---|
DATABASE_URL |
Yes | postgresql+asyncpg://user:pass@host/db |
REDIS_URL |
Yes | redis://redis:6379/0 |
SECRET_KEY |
Yes | JWT signing secret (32+ random bytes) |
AZURE_CLIENT_ID |
Yes | Azure AD app client ID |
AZURE_TENANT_ID |
Yes | Azure AD tenant ID |
OPENAI_API_KEY |
Yes | OpenAI key (RAG + embeddings) |
ANTHROPIC_API_KEY |
Yes | Anthropic key (Claude Sonnet + Haiku) |
GOOGLE_API_KEY |
No | Google Gemini key (summary / planning) |
QDRANT_URL |
Yes | http://qdrant:6333 |
CLOUD_RUN_PROCESSOR_URL |
No | Cloud Run URL; if empty uses local processor |
LLAMAPARSE_API_KEY |
No | LlamaParse key for scanned PDF OCR |
UPLOAD_DIR |
No | File storage directory (default: /tmp/uploads) |
CHUNK_SIZE |
No | Chunk size in chars (default: 1000) |
CHUNK_OVERLAP |
No | Chunk overlap in chars (default: 200) |
ACCESS_TOKEN_EXPIRE_MINUTES |
No | JWT lifetime (default: 480 = 8 hours) |
Note: Azure AD credentials (client ID, tenant ID) are also hardcoded as defaults in
frontend/lib/microsoft-oauth.tsandnext.config.mjssince the static frontend has no runtime env.
Database Migrations
# Apply all pending migrations
docker exec backend alembic upgrade head
# Check current revision
docker exec backend alembic current
# Generate new migration
docker exec backend alembic revision --autogenerate -m "description"
# Downgrade one step
docker exec backend alembic downgrade -1
Known gotcha: Do NOT use sa.Enum(create_type=False) with asyncpg — it is silently ignored. Let sa.Enum() inside op.create_table() handle type creation naturally. Do not add explicit CREATE TYPE SQL in the same migration.
API Overview
| Method | Path | Description |
|---|---|---|
| POST | /api/v1/auth/login |
MS access_token → app JWT |
| POST | /api/v1/chat/stream |
SSE streaming chat (RAG / assistant / general) |
| GET | /api/v1/admin/knowledge/documents |
List documents (max 1000) |
| POST | /api/v1/admin/knowledge/upload |
Upload document (202, async processing) |
| POST | /api/v1/admin/knowledge/scrape |
Scrape URL and index |
| POST | /api/v1/admin/knowledge/stats |
Doc counts + Qdrant vector count |
| POST | /api/v1/admin/knowledge/documents/{id}/reprocess |
Re-queue single document |
| POST | /api/v1/admin/knowledge/documents/bulk-delete |
Delete multiple docs + vectors |
| POST | /api/v1/admin/knowledge/reindex-all |
Re-queue all completed docs |
| POST | /api/v1/admin/knowledge/reindex-stop |
Cancel pending reindex tasks |
| DELETE | /api/v1/admin/knowledge/documents/{id} |
Delete doc + Qdrant vectors |
| GET | /api/v1/admin/users |
List users (super_admin only) |
| GET | /api/v1/admin/departments |
List departments |
Full interactive docs at /docs (Swagger UI).
Troubleshooting
Backend won't start:
docker logs backend
docker-compose -f docker-compose.prod.yml restart backend
Reindex failing with AttributeError:
Ensure you are on the latest commit — an early self.embeddings typo was fixed in document_processor.py.
Documents stuck in Processing:
File may be too large for Cloud Run timeout (900s). Check Cloud Run logs. For very large files, consider increasing --timeout or splitting the file.
Qdrant vector count shows 0:
Ensure Qdrant is running and QDRANT_URL is correct. The stats endpoint uses count() (compatible with Qdrant 1.12.x).
Azure AD login fails: Ensure the Azure AD app is configured as SPA platform (not Web). No client_secret should be needed. Check that redirect URIs include your frontend callback URL.
Frontend shows old version after deploy: Clear browser cache or do a hard refresh (Cmd+Shift+R). Apache serves static files without cache-busting headers by default.
Personal Assistant not working: Click "Connect Microsoft 365" in the sidebar and accept the consent dialog. The MS access_token is required for Graph API calls.
Documentation
Full documentation is in the docs/ directory:
docs/01_Enterprise_AI_Hub_Nexus_User_Guide.pdf— end-user guidedocs/02_Enterprise_AI_Hub_Nexus_Technical_Documentation.pdf— developer reference with architecture diagrams, RAG pipeline, Cloud Run setup, API reference
Built for OLIVER Agency — March 2026