Backend: POST /admin/knowledge/purge-orphaned-vectors — scrolls all Qdrant vectors, finds sharepoint_ids with no matching DB document_key, deletes them. Frontend: "Purge Orphans" button in Documents table toolbar. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| alembic | ||
| app | ||
| scripts | ||
| .env.example | ||
| alembic.ini | ||
| celery_app.py | ||
| cloud_run_service.py | ||
| cloudbuild-processor.yaml | ||
| Dockerfile | ||
| Dockerfile.cloud-run | ||
| Dockerfile.worker | ||
| entrypoint.sh | ||
| README.md | ||
| requirements.txt | ||
| seed_data.py | ||
Enterprise AI Hub "Nexus" - Backend
FastAPI backend with Microsoft Entra ID authentication, PostgreSQL, Redis, and Qdrant integration.
🚀 Quick Start
1. Set up environment variables
Create a .env file in the backend directory:
cp .env.example .env
Edit .env and fill in your credentials:
- Microsoft Entra ID credentials
- Database password
- JWT secret (generate with:
openssl rand -hex 32) - LLM API keys (optional for Phase 2)
2. Run with Docker Compose
From the project root:
# Start all services
docker-compose up -d
# View logs
docker-compose logs -f backend
# Stop services
docker-compose down
3. Run database migrations
# Enter backend container
docker exec -it nexus-backend bash
# Run migrations
alembic upgrade head
# Seed initial data (regions/departments)
python seed_data.py
# Exit container
exit
4. Access the API
- API Documentation: http://localhost:8000/docs
- Health Check: http://localhost:8000/api/v1/health
- ReDoc: http://localhost:8000/redoc
📋 API Endpoints
Authentication
POST /api/v1/auth/login- Login with Microsoft Entra IDPOST /api/v1/auth/refresh- Refresh access token
Health
GET /api/v1/health- System health check
🔐 Authentication Flow
- Frontend redirects to Microsoft Entra ID login
- User authenticates and grants permissions
- Entra ID redirects back with authorization code
- Frontend sends code to
/api/v1/auth/login - Backend:
- Exchanges code for Entra ID access token
- Fetches user profile from MS Graph
- Auto-provisions user in database (first login)
- Generates JWT access token (15min) and refresh token (7 days)
- Frontend stores tokens and uses access token for API calls
🗄️ Database Schema
Phase 2 Tables:
- regions: Geographical regions (UK, US, APAC, EU)
- departments: Departments within regions (UK/HR, US/IT, etc.)
- users: User accounts with RBAC roles
See implementation_plan.md for complete schema.
🛠️ Development
Local Development (without Docker)
# Install dependencies
pip install -r requirements.txt
# Run migrations
alembic upgrade head
# Seed data
python seed_data.py
# Run development server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
Create a new migration
# Auto-generate migration from model changes
alembic revision --autogenerate -m "Description of changes"
# Review the generated migration file
# Edit if needed: alembic/versions/<timestamp>_description.py
# Apply migration
alembic upgrade head
Testing
# Run tests (Phase 2+)
pytest
# With coverage
pytest --cov=app tests/
📁 Project Structure
backend/
├── app/
│ ├── __init__.py
│ ├── main.py # FastAPI app entry point
│ ├── config.py # Settings (env vars)
│ ├── database.py # SQLAlchemy async setup
│ ├── models/ # ORM models
│ │ ├── user.py
│ │ └── taxonomy.py
│ ├── schemas/ # Pydantic models
│ │ ├── auth.py
│ │ └── user.py
│ ├── api/v1/
│ │ ├── router.py # API router
│ │ └── endpoints/
│ │ ├── auth.py # Authentication
│ │ └── health.py # Health check
│ ├── core/
│ │ ├── auth.py # JWT utilities
│ │ └── dependencies.py # RBAC middleware
│ └── utils/ # Utility functions
├── alembic/ # Database migrations
├── tests/ # Unit tests
├── requirements.txt
├── Dockerfile
└── seed_data.py # Initial data seeding
🔧 Environment Variables
See .env.example for all available configuration options.
Required:
DATABASE_URLPOSTGRES_PASSWORDREDIS_URLQDRANT_URLENTRA_CLIENT_IDENTRA_CLIENT_SECRETENTRA_TENANT_IDJWT_SECRET
Optional (for future phases):
OPENAI_API_KEYGOOGLE_API_KEYANTHROPIC_API_KEYNOTEBOOKLLAMA_URL
🎯 Phase 5: NotebookLlama Integration ✅
Phase 5 implements the BFF (Backend for Frontend) pattern for NotebookLlama integration.
What Was Implemented
-
Database Models (app/models/notebook.py)
NotebookSession: Maps internal sessions to external NotebookLlama sessionsUploadedFile: Tracks files uploaded to sessions
-
Database Migration (alembic/versions/003_notebook_mode.py)
- Creates
notebook_sessionsanduploaded_filestables
- Creates
-
NotebookLlama Client (app/core/notebook_client.py)
- Wraps HTTP calls to external NotebookLlama API
- Methods:
create_notebook(),upload_document(),chat_stream()
-
API Endpoints (app/api/v1/endpoints/notebook.py)
POST /api/v1/notebook/create- Create new notebook sessionPOST /api/v1/notebook/{id}/upload- Upload file to sessionPOST /api/v1/notebook/{id}/chat- Chat with streaming (SSE)POST /api/v1/notebook/{id}/pin- Pin session (prevent expiration)GET /api/v1/notebook/{id}- Get session detailsDELETE /api/v1/notebook/{id}- Delete session
Key Features
- Auto-expiration: Sessions expire after 24 hours by default
- Pin feature: Users can pin sessions to prevent expiration
- Quota management: Tracks total file size per session (100MB limit)
- File upload: Validates file size, stores locally, proxies to NotebookLlama
- Chat streaming: Uses Server-Sent Events (SSE) for real-time responses
Usage Example
# 1. Create a notebook session
curl -X POST "http://localhost:8000/api/v1/notebook/create" \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-d '{"title": "Financial Analysis"}'
# 2. Upload a file
curl -X POST "http://localhost:8000/api/v1/notebook/{session_id}/upload" \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-F "file=@document.pdf"
# 3. Chat (frontend uses EventSource for SSE)
# See frontend documentation for streaming chat implementation
# 4. Pin the session
curl -X POST "http://localhost:8000/api/v1/notebook/{session_id}/pin" \
-H "Authorization: Bearer YOUR_JWT_TOKEN"
Configuration
Update your .env file:
# NotebookLlama service URL
NOTEBOOKLLAMA_URL=http://internal-notebook-server:8080
# File upload settings
MAX_UPLOAD_SIZE_MB=100
UPLOAD_DIR=/app/uploads
Apply Migration
# Run the Phase 5 migration
alembic upgrade head
📝 Next Steps
Phase 3: SharePoint ingestion pipeline Phase 4: RAG logic & LLM router Phase 5: NotebookLlama integration ✅ COMPLETE Phase 6: Frontend development
See implementation_plan.md for details.