Vadym Samoilenko d8b8d11005 docs: rewrite README + add PDF user guide and technical documentation

- README.md: full rewrite reflecting current production state — PKCE auth,
  RAG pipeline, Cloud Run, deployment, env vars, API overview, troubleshooting
- docs/01_Enterprise_AI_Hub_Nexus_User_Guide.pdf
- docs/02_Enterprise_AI_Hub_Nexus_Technical_Documentation.pdf

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-05 23:01:47 +00:00

18 KiB

Raw Permalink Blame History

Enterprise AI Hub Nexus

Secure AI platform for knowledge management, RAG chat, and Microsoft 365 productivity — built for OLIVER Agency.

Overview
Features
Architecture
Technology Stack
Repository Structure
RAG Pipeline
Document Processing
Authentication
Local Development
Production Deployment
Environment Variables
Database Migrations
API Overview
Troubleshooting

Overview

Nexus is an enterprise AI platform that provides:

RAG Chat — natural language questions answered from the company knowledge base, with source citations
Personal Assistant — read emails, calendar, OneDrive files and SharePoint content via Microsoft Graph (read-only)
Knowledge Base Management — admin panel to upload, manage and re-index documents
Multi-language support — ask in any language; responses are in the same language
Department & Region scoping — content filtered per team and location

The platform runs in production on a GCE VM (optical-web-1) with Apache as a reverse proxy, Docker Compose for backend services, and a Google Cloud Run microservice for heavy document processing.

Features

Authentication

Microsoft Entra ID (Azure AD) — PKCE SPA flow, no client_secret
JWT tokens (HS256, 8-hour lifetime)
Role-based access: super_admin, content_manager, user
Microsoft 365 consent flow for Personal Assistant mode

RAG Chat

Multi-query expansion — 3 search variants (translated + UK English + US English terminology)
Parallel Qdrant vector search across all variants
LLM reranking (Claude Haiku, 0–10 score) from up to 60 candidates → top 5
Contextual chunk embedding ([Document Title]\n\nchunk text)
Document summary vectors (AI-generated 3–4 sentence summary per document)
Source citations with SharePoint links
SSE streaming responses

Personal Assistant (M365)

Read emails, summarise threads
Read calendar events
List and search OneDrive files
Search SharePoint document libraries
Agentic tool-calling loop (parallel tool execution, up to 5 rounds)

Knowledge Base

Upload: PDF, DOCX, DOC, XLSX, XLS, PPTX, PPT, TXT, CSV
Web page scraping (URL → index)
SharePoint document library browser and import
SHA-256 deduplication (identical files skipped)
4-concurrent upload queue
Per-document reprocess button (failed or 0-chunk documents)
Re-index All — re-embeds all documents with current pipeline
Stop Re-index — cancels pending reindex tasks
Bulk delete with checkboxes
Stats: total docs, completed, failed, vectors in Qdrant
Sortable table, limit 1000 documents

Admin Panel

User management (invite, role, department, region)
Department management
LLM provider API key configuration
Analytics dashboard
SharePoint source configuration

Architecture

User Browser (Next.js SPA)
        │  HTTPS / SSE
        ▼
Apache 2.4 (reverse proxy + static files)
        │
        ├─── /nexus/*  ──→  /var/www/html/enterprise-ai-hub-nexus/  (static export)
        │
        └─── /api/v1/* ──→  FastAPI :8000 (Docker)
                                │
                    ┌───────────┼───────────┐
                    ▼           ▼           ▼
                Qdrant       PostgreSQL   Redis
               :6333          :5432       :6379
               (vectors)    (metadata)  (Celery)
                    │
                    └── Cloud Run: Doc Processor (extract + chunk)
                    └── Azure AD / MS Graph (auth + M365 tools)

GCE VM optical-web-1 hosts: Apache, FastAPI, Qdrant, PostgreSQL, Redis, Celery worker/beat — all in Docker Compose.

Google Cloud Run (nexus-processor, europe-west1) handles CPU-intensive document extraction and chunking, called via HTTPS from the backend VM.

Technology Stack

Layer	Technology	Details
Frontend	Next.js 14	App Router, static export, `basePath: /nexus`
Frontend	React 18 + TypeScript 5
Frontend	Tailwind CSS + shadcn/ui
Frontend	Zustand	`useAuthStore`, `useChatStore`
Backend	FastAPI 0.115+	Python 3.11+
Backend	SQLAlchemy 2.x (async)	asyncpg driver
Backend	Alembic	14 migrations
Backend	Celery + Redis	Background tasks and scheduler
AI — RAG	OpenAI GPT-5 (`gpt-5.2`)	Streaming answers
AI — Assistant	Anthropic Claude Sonnet (`claude-sonnet-4-6`)	Agentic tool loop
AI — Reranking	Anthropic Claude Haiku (`claude-haiku-4-5`)	Rerank, summaries, query expansion
AI — Summary	Google Gemini (`gemini-3.1-pro-preview`)	Summarisation, planning
AI — Embeddings	OpenAI `text-embedding-3-large`	3072 dimensions
Vector DB	Qdrant 1.12.x	Self-hosted on GCE VM
Relational DB	PostgreSQL 15
Cloud	Google Cloud Run	Document processor microservice
Infrastructure	GCE VM n2d-standard-4	Backend + all services
Auth	Azure AD / Entra ID	PKCE SPA flow
Auth	Microsoft Graph API v1.0	User profile + M365 tools
Web server	Apache 2.4	Reverse proxy + static files
Containers	Docker Compose	`docker-compose.prod.yml`

Repository Structure

enterprise-ai-hub-nexus/
│
├── backend/
│   ├── app/
│   │   ├── api/v1/endpoints/
│   │   │   ├── auth.py              # PKCE login → JWT
│   │   │   ├── chat.py              # SSE streaming RAG + assistant
│   │   │   ├── knowledge.py         # Document upload, list, delete, reindex
│   │   │   ├── users.py             # User CRUD (super_admin)
│   │   │   ├── departments.py       # Department management
│   │   │   └── config.py            # LLM key configuration
│   │   ├── core/
│   │   │   ├── document_processor.py   # Extract, chunk, embed, upsert to Qdrant
│   │   │   ├── llm.py                  # LLMFactory — multi-provider, streaming, tool loop
│   │   │   ├── cloud_run_client.py     # HTTP client for Cloud Run processor
│   │   │   └── web_scraper.py          # URL → text via trafilatura
│   │   ├── rag/
│   │   │   └── retriever.py         # Multi-query expansion, parallel search, LLM rerank
│   │   ├── tools/                   # Personal assistant tools (email, calendar, files)
│   │   ├── models/                  # SQLAlchemy ORM models
│   │   ├── schemas/                 # Pydantic request/response schemas
│   │   ├── config.py                # pydantic-settings (env vars)
│   │   └── database.py              # Async engine + AsyncSessionLocal
│   ├── alembic/versions/            # 14 migration files
│   ├── cloud_run_processor/         # Cloud Run microservice (extract + chunk only)
│   ├── Dockerfile
│   └── requirements.txt
│
├── frontend/
│   ├── app/
│   │   ├── admin/page.tsx           # Admin dashboard
│   │   ├── auth/callback/page.tsx   # OAuth callback handler
│   │   └── chat/page.tsx            # Main chat UI
│   ├── components/
│   │   ├── admin/                   # KnowledgeUploader, SharePointBrowser, UsersTab, etc.
│   │   ├── auth/protected-route.tsx # Auth guard with hydration tracking
│   │   └── chat/chat-interface.tsx  # SSE stream consumer, citations
│   ├── lib/
│   │   ├── api-client.ts            # Typed API client with JWT auto-attach
│   │   └── microsoft-oauth.ts       # PKCE flow + MS token exchange
│   ├── store/                       # useAuthStore, useChatStore (Zustand)
│   └── types/                       # TypeScript types
│
├── docs/
│   ├── 01_Enterprise_AI_Hub_Nexus_User_Guide.pdf
│   ├── 02_Enterprise_AI_Hub_Nexus_Technical_Documentation.pdf
│   └── OLIVER_BRAND_ADAPTATION.md
│
├── docker-compose.prod.yml          # Production services
├── docker-compose.local.yml         # Local dev (db :5433, redis :6380, backend :1222)
└── deploy.sh                        # Full deploy script

RAG Pipeline

The retrieval pipeline lives in backend/app/rag/retriever.py.

User Query
    │
    ▼
Query Expansion (Claude Haiku)
  → Variant 1: normalised / translated
  → Variant 2: UK English (annual leave, holiday, redundancy...)
  → Variant 3: US English (vacation, PTO, layoff...)
    │
    ▼
Parallel Embed (text-embedding-3-large, asyncio.gather)
    │
    ▼
Parallel Qdrant Search (top_k=10 per variant, filters: is_active, department, region)
    │
    ▼
Merge + Dedup by point ID (highest score kept) → up to 60 candidates
    │
    ▼
LLM Reranking (Claude Haiku, score 0–10 per chunk) → top 5
    │
    ▼
LLM Answer (GPT-5, streaming SSE) + source citations

Key design decisions:

Multi-query expansion bridges UK/US terminology differences in HR documents
Reranking replaces binary yes/no grading with continuous relevance scores
Each chunk stored with original text; contextualised version used only for embedding
Document summary vectors improve topic-level discovery

Document Processing

backend/app/core/document_processor.py — two-phase design:

Phase 1 — extract_and_chunk (runs on Cloud Run, no Qdrant/OpenAI needed):

PDF (text-based): MarkItDown
PDF (scanned): LlamaParse (cloud OCR), falls back to MarkItDown
DOCX, XLSX, PPTX: MarkItDown
TXT/CSV: direct UTF-8 decode
Chunk size: 1000 chars, overlap: 200 chars

Phase 2 — embed_and_upsert (runs on backend VM):

Delete existing vectors for document (re-ingestion safe)
Contextualise chunks: [Document Title]\n\nchunk text
Embed in parallel batches of 100 (asyncio.gather)
Generate document summary (Claude Haiku, 3–4 sentences)
Upsert all points + summary vector to Qdrant
Update DB status: COMPLETED / FAILED

Background processing — all document operations use FastAPI BackgroundTasks with an independent DB session to avoid StaleDataError from long-lived HTTP sessions.

Authentication

PKCE OAuth 2.0 flow — no client_secret:

Browser generates code_verifier + code_challenge (S256)
Redirect to Azure AD with code_challenge
Azure AD returns auth_code to /auth/callback
Browser exchanges auth_code + code_verifier → MS access_token (direct to Microsoft)
Browser POSTs ms_access_token to POST /api/v1/auth/login
Backend calls MS Graph /me to validate token and get user profile
Backend returns signed app JWT
All subsequent API calls use Authorization: Bearer <JWT>

For Personal Assistant mode, the MS access_token is passed in the request body as graph_token and used for Graph API calls on-demand.

Local Development

Prerequisites

Docker & Docker Compose
Node.js 18+
Python 3.11+ (optional, for running backend outside Docker)

1. Clone

git clone git@bitbucket.org:zlalani/enterprise-ai-hub-nexus.git
cd enterprise-ai-hub-nexus

2. Configure environment

cp backend/.env.example backend/.env
# Edit backend/.env with real API keys (see Environment Variables section)

3. Start backend services

docker-compose -f docker-compose.local.yml up -d
# Backend: http://localhost:1222
# PostgreSQL: localhost:5433
# Redis: localhost:6380
# Qdrant: http://localhost:6333

4. Apply migrations

docker exec backend alembic upgrade head

5. Start frontend

cd frontend
npm install
npm run dev
# http://localhost:3000/nexus

6. API docs

http://localhost:1222/docs

Production Deployment

Production runs on GCE VM optical-web-1. Deploy with:

# On the server
cd /opt/enterprise-ai-hub-nexus
./deploy.sh

deploy.sh performs:

git pull origin main
cd frontend && npm ci && npm run build
rsync out/ /var/www/html/enterprise-ai-hub-nexus/
docker-compose -f docker-compose.prod.yml up -d --build backend celery-worker
docker exec backend alembic upgrade head

Docker Compose services (prod)

Service	Image	Port	Description
backend	./Dockerfile	8000	FastAPI + uvicorn
celery-worker	./Dockerfile	—	Celery worker (healthcheck: inspect ping)
celery-beat	./Dockerfile	—	Celery scheduler
redis	redis:7-alpine	6379	Broker + cache
qdrant	qdrant/qdrant:v1.12.1	6333, 6334	Vector DB

Cloud Run Processor

Document processor deployed separately:

cd backend/cloud_run_processor
gcloud run deploy nexus-processor \
  --region europe-west1 \
  --timeout 900 \
  --memory 2Gi \
  --no-allow-unauthenticated

URL: https://nexus-processor-818629422283.europe-west1.run.app

The backend VM service account needs roles/run.invoker on this service.

Environment Variables

Variable	Required	Description
`DATABASE_URL`	Yes	`postgresql+asyncpg://user:pass@host/db`
`REDIS_URL`	Yes	`redis://redis:6379/0`
`SECRET_KEY`	Yes	JWT signing secret (32+ random bytes)
`AZURE_CLIENT_ID`	Yes	Azure AD app client ID
`AZURE_TENANT_ID`	Yes	Azure AD tenant ID
`OPENAI_API_KEY`	Yes	OpenAI key (RAG + embeddings)
`ANTHROPIC_API_KEY`	Yes	Anthropic key (Claude Sonnet + Haiku)
`GOOGLE_API_KEY`	No	Google Gemini key (summary / planning)
`QDRANT_URL`	Yes	`http://qdrant:6333`
`CLOUD_RUN_PROCESSOR_URL`	No	Cloud Run URL; if empty uses local processor
`LLAMAPARSE_API_KEY`	No	LlamaParse key for scanned PDF OCR
`UPLOAD_DIR`	No	File storage directory (default: `/tmp/uploads`)
`CHUNK_SIZE`	No	Chunk size in chars (default: `1000`)
`CHUNK_OVERLAP`	No	Chunk overlap in chars (default: `200`)
`ACCESS_TOKEN_EXPIRE_MINUTES`	No	JWT lifetime (default: `480` = 8 hours)

Note: Azure AD credentials (client ID, tenant ID) are also hardcoded as defaults in frontend/lib/microsoft-oauth.ts and next.config.mjs since the static frontend has no runtime env.

Database Migrations

# Apply all pending migrations
docker exec backend alembic upgrade head

# Check current revision
docker exec backend alembic current

# Generate new migration
docker exec backend alembic revision --autogenerate -m "description"

# Downgrade one step
docker exec backend alembic downgrade -1

Known gotcha: Do NOT use sa.Enum(create_type=False) with asyncpg — it is silently ignored. Let sa.Enum() inside op.create_table() handle type creation naturally. Do not add explicit CREATE TYPE SQL in the same migration.

API Overview

Method	Path	Description
POST	`/api/v1/auth/login`	MS access_token → app JWT
POST	`/api/v1/chat/stream`	SSE streaming chat (RAG / assistant / general)
GET	`/api/v1/admin/knowledge/documents`	List documents (max 1000)
POST	`/api/v1/admin/knowledge/upload`	Upload document (202, async processing)
POST	`/api/v1/admin/knowledge/scrape`	Scrape URL and index
POST	`/api/v1/admin/knowledge/stats`	Doc counts + Qdrant vector count
POST	`/api/v1/admin/knowledge/documents/{id}/reprocess`	Re-queue single document
POST	`/api/v1/admin/knowledge/documents/bulk-delete`	Delete multiple docs + vectors
POST	`/api/v1/admin/knowledge/reindex-all`	Re-queue all completed docs
POST	`/api/v1/admin/knowledge/reindex-stop`	Cancel pending reindex tasks
DELETE	`/api/v1/admin/knowledge/documents/{id}`	Delete doc + Qdrant vectors
GET	`/api/v1/admin/users`	List users (super_admin only)
GET	`/api/v1/admin/departments`	List departments

Full interactive docs at /docs (Swagger UI).

Troubleshooting

Backend won't start:

docker logs backend
docker-compose -f docker-compose.prod.yml restart backend

Reindex failing with AttributeError: Ensure you are on the latest commit — an early self.embeddings typo was fixed in document_processor.py.

Documents stuck in Processing: File may be too large for Cloud Run timeout (900s). Check Cloud Run logs. For very large files, consider increasing --timeout or splitting the file.

Qdrant vector count shows 0: Ensure Qdrant is running and QDRANT_URL is correct. The stats endpoint uses count() (compatible with Qdrant 1.12.x).

Azure AD login fails: Ensure the Azure AD app is configured as SPA platform (not Web). No client_secret should be needed. Check that redirect URIs include your frontend callback URL.

Frontend shows old version after deploy: Clear browser cache or do a hard refresh (Cmd+Shift+R). Apache serves static files without cache-busting headers by default.

Personal Assistant not working: Click "Connect Microsoft 365" in the sidebar and accept the consent dialog. The MS access_token is required for Graph API calls.

Documentation

Full documentation is in the docs/ directory:

docs/01_Enterprise_AI_Hub_Nexus_User_Guide.pdf — end-user guide
docs/02_Enterprise_AI_Hub_Nexus_Technical_Documentation.pdf — developer reference with architecture diagrams, RAG pipeline, Cloud Run setup, API reference

Built for OLIVER Agency — March 2026

18 KiB Raw Permalink Blame History Unescape Escape

Enterprise AI Hub Nexus

Table of Contents

Overview

Features

Authentication

RAG Chat

Personal Assistant (M365)

Knowledge Base

Admin Panel

Architecture

Technology Stack

Repository Structure

RAG Pipeline

Document Processing

Authentication

Local Development

Prerequisites

1. Clone

2. Configure environment

3. Start backend services

4. Apply migrations

5. Start frontend

6. API docs

Production Deployment

Docker Compose services (prod)

Cloud Run Processor

Environment Variables

Database Migrations

API Overview

Troubleshooting

Documentation

18 KiB

Raw Permalink Blame History