# 🦙 Sandbox-NotebookLM Enterprise-ready alternative to Google NotebookLM with multi-user support, powered by LlamaIndex and multiple AI models. **Live Demo:** https://ai-sandbox.oliver.solutions/notebookllama/ **Repository:** https://bitbucket.org/zlalani/sandbox-notebookllamalm-nextjs --- ## 🚀 Quick Start (Docker) ### Prerequisites - Docker and Docker Compose - Git ### 1. Clone ```bash git clone git@bitbucket.org:zlalani/sandbox-notebookllamalm-nextjs.git cd sandbox-notebookllamalm-nextjs ``` ### 2. Configure environment Create `backend/.env`: ```bash # Core (required) OPENAI_API_KEY=sk-... LLAMACLOUD_API_KEY=llx-... # Model-specific (optional) ANTHROPIC_API_KEY=sk-ant-... GOOGLE_API_KEY=AI... GROQ_API_KEY=gsk_... DEEPSEEK_API_KEY=sk-... # Podcast generation (optional) ELEVENLABS_API_KEY=... # Database pgql_user=postgres pgql_psw=admin pgql_db=postgres_nextjs # Microsoft SSO (optional) AZURE_CLIENT_ID=your-client-id AZURE_AUTHORITY=https://login.microsoftonline.com/your-tenant-id AZURE_REDIRECT_URI=https://your-domain.com/notebookllama/ ``` Create `frontend/.env.production` (baked into build): ```bash NEXT_PUBLIC_API_URL=https://your-domain.com/notebookllama-back NEXT_PUBLIC_WS_URL=wss://your-domain.com/notebookllama-back NEXT_PUBLIC_AZURE_CLIENT_ID=your-client-id NEXT_PUBLIC_AZURE_AUTHORITY=https://login.microsoftonline.com/your-tenant-id NEXT_PUBLIC_AZURE_REDIRECT_URI=https://your-domain.com/notebookllama/ ``` For local dev, use `frontend/.env.local`: ```bash NEXT_PUBLIC_API_URL=http://localhost:9000 NEXT_PUBLIC_WS_URL=ws://localhost:9000 ``` ### 3. Build and start ```bash docker compose up -d --build ``` This starts: **backend** (port 9000), **frontend** (port 4000), **PostgreSQL** (port 5433), **Redis** (port 6380). ### 4. Initialize database On first run: ```bash docker compose exec backend /app/.venv/bin/python -c \ "import sys; sys.path.insert(0, '/app/src/notebookllama'); from database import init_db; init_db(); print('Done')" ``` --- ## 🌐 Access Points | Service | URL | |---------|-----| | Frontend | http://localhost:4000 | | Backend API | http://localhost:9000 | | API Docs (Swagger) | http://localhost:9000/docs | | Health Check | http://localhost:9000/api/health | --- ## 📚 Features ### Core - **Multi-Notebook Management** — organize documents into collections - **6 AI Models** — GPT-5, Claude 4.5, Gemini 2.5 Pro, GPT-4o, Gemini Flash, GPT-4 - **40+ File Formats** — PDF, DOCX, PPTX, XLSX, CSV, images (OCR), audio (transcription), video (multimodal) - **Background Processing** — non-blocking uploads with real-time status - **Document Summaries** — AI-generated summaries, highlights, Q&A pairs ### Analysis - **Cross-Document Synthesis** — themes, insights, comparative findings across all docs (persists to DB) - **Studio** — 7 additional output types generated from your documents: - **Flashcards** — 15-20 study cards with 3D flip animation - **Quiz** — 10-12 multiple choice questions with scoring - **Mind Map** — SVG radial tree visualization - **Slide Deck** — 8-12 slide presentation with PPTX download - **Report** — executive summary + sections + conclusions with PDF download - **Infographic** — visual blocks with stats and emojis - **Data Table** — structured comparison table ### Chat - **Real-time WebSocket Chat** — ask questions across all documents - **Multiple Sessions** — organize conversations, rename, share, delete - **Source Citations** — see which documents were used - **Markdown Rendering** — formatted AI responses ### Podcast - **AI-Generated Podcast** — two-voice audio discussion from your documents - **Customizable** — length (5-30 min), voices, theme, instructions - **Built-in Player** — listen or download directly ### Collaboration - **Notebook Sharing** — share by email with Read / Write / Share permissions - **Shared Sessions** — make chat sessions visible to collaborators ### Admin - System stats, cost estimation, user management, task monitoring ### Auth - Local email/password + Microsoft SSO (MSAL PKCE) - Role-based access (admin / regular user) --- ## 🎯 AI Models | Model | Provider | Input | Output | |-------|----------|-------|--------| | GPT-5 | OpenAI | $1.25/1M | $10/1M | | Claude 4.5 | Anthropic | $3/1M | $15/1M | | Gemini 2.5 Pro | Google | $1.25/1M | $5/1M | | GPT-4o | OpenAI | $5/1M | $15/1M | | Gemini Flash | Google | $0.075/1M | $0.30/1M | | GPT-4 | OpenAI | $30/1M | $60/1M | --- ## 🏗️ Architecture ### Tech Stack **Frontend:** Next.js 15 (App Router), React 19, TypeScript, Tailwind CSS 4, React Query, Zustand, MSAL, WebSocket **Backend:** FastAPI, SQLAlchemy, Python 3.13, uv package manager **AI/ML:** LlamaCloud (indexing), LlamaIndex (RAG), LlamaParse (parsing), LlamaExtract (structured extraction), Gemini 2.5 Pro (video multimodal), ElevenLabs (voice), python-pptx, weasyprint **Infrastructure:** Docker Compose, PostgreSQL, Redis ### Database Schema (10 Tables) | Table | Key columns | |-------|-------------| | users | email, username, is_admin, auth_provider | | notebooks | name, model_type, synthesis_data, studio_data, podcast_path | | documents | filename, llamacloud_file_id | | notebook_documents | notebook_id, document_id | | document_summaries | summary, highlights, questions, answers | | chat_sessions | title, is_shared, notebook_id | | chat_messages | role, content, sources | | document_shares | permission_level (READ/WRITE/SHARE/ADMIN) | | background_tasks | status, task_type | --- ## 🔌 API Endpoints Full docs: `http://localhost:9000/docs` **Notebooks:** - `GET/POST /api/notebooks/` — list / create - `GET/PUT/DELETE /api/notebooks/{id}` — get / update / delete - `POST /api/notebooks/{id}/synthesis` — generate cross-doc analysis - `GET /api/notebooks/{id}/synthesis` — get saved synthesis - `POST /api/notebooks/{id}/podcast` — start podcast generation - `POST /api/notebooks/{id}/share` — share with user - `GET /api/notebooks/{id}/studio` — get all saved Studio outputs - `POST /api/notebooks/{id}/studio/{type}` — generate (flashcards / quiz / mindmap / slides / report / infographic / datatable) - `GET /api/notebooks/{id}/studio/slides/download` — PPTX file - `GET /api/notebooks/{id}/studio/report/download` — PDF file **Documents:** - `POST /api/documents/upload/{notebookId}` — upload file - `GET /api/documents/task/{taskId}` — task status - `GET /api/documents/{documentId}/summary` — get analysis **Chat:** - `WS /api/chat/ws/{notebookId}?session_id={id}` — real-time chat - `GET /api/chat/{notebookId}/sessions` — list sessions - `POST /api/chat/{notebookId}/sessions` — create session **Auth:** - `POST /api/auth/signup` — register - `POST /api/auth/login` — login - `POST /api/auth/microsoft` — SSO login --- ## 🔐 Microsoft SSO Setup 1. Register Azure AD app (SPA with PKCE, `User.Read` permission) 2. Add redirect URI matching `NEXT_PUBLIC_AZURE_REDIRECT_URI` 3. Set `AZURE_CLIENT_ID`, `AZURE_AUTHORITY`, `AZURE_REDIRECT_URI` in `backend/.env` 4. Set matching `NEXT_PUBLIC_*` vars in frontend env SSO users are auto-created on first login. Local accounts are merged if the same email logs in via SSO. --- ## 📖 Usage Guide **Create a notebook:** 1. My Notebooks → New Notebook → choose name + AI model **Upload documents:** 1. Open notebook → Select Files → wait for processing (~1 min/doc, ~1 min/10min of video) **Studio:** 1. Open notebook with processed documents 2. Scroll to Studio section → click any card (Flashcards, Quiz, etc.) 3. Results load or generate fresh; PPTX/PDF download available for Slides/Report **Cross-doc analysis:** 1. Click "Cross-Doc Analysis" → wait 30-60s → results persist **Podcast:** 1. Click "Podcast" → choose length + voices → "Generate" → wait 3-5 min **Chat:** 1. Click "Chat" → create/select session → ask questions --- ## 🐛 Troubleshooting **Backend 500 on all routes after deploy:** ```bash docker compose build backend && docker compose up -d backend ``` **`column notebooks.studio_data does not exist`:** ```bash docker compose exec backend /app/.venv/bin/python -c \ "import sys; sys.path.insert(0, '/app/src/notebookllama'); from database import run_studio_migration; run_studio_migration(); print('Done')" docker compose restart backend ``` **Frontend shows old UI:** ```bash docker compose build frontend && docker compose up -d frontend ``` **Database errors:** ```bash docker compose logs postgres --tail=30 docker compose restart postgres ``` **Podcast stuck:** Connect to DB and reset stuck tasks: ```bash docker compose exec postgres psql -U postgres -d postgres_nextjs -c \ "UPDATE background_tasks SET status='FAILED', error_message='Timeout', completed_at=NOW() WHERE task_type='podcast_generation' AND status='IN_PROGRESS';" ``` --- ## 🚢 Production Deployment See `scripts/` for automated migration scripts: - `scripts/1_backup.sh` — backup DB + files before migration - `scripts/2_deploy.sh` — pull, build, switch from systemd to Docker - `scripts/3_cleanup.sh` — remove build artifacts after verification For developer notes, see [CLAUDE.md](./CLAUDE.md). --- ## 📁 Project Structure ``` sandbox-notebookllamalm-nextjs/ ├── backend/ │ ├── src/ │ │ ├── api/ │ │ │ ├── main.py │ │ │ └── routes/ auth, notebooks, documents, chat, admin │ │ └── notebookllama/ │ │ ├── database.py SQLAlchemy models │ │ ├── studio_generators.py 7 Studio LLM generators │ │ ├── audio.py podcast generation │ │ ├── background_tasks.py │ │ └── llm_factory.py │ ├── Dockerfile │ └── .env ├── frontend/ │ ├── src/ │ │ ├── app/ │ │ │ ├── notebooks/[id]/page.tsx main notebook page │ │ │ └── ... │ │ ├── lib/api.ts │ │ └── types/index.ts │ ├── Dockerfile │ └── .env.production ├── scripts/ │ ├── 1_backup.sh │ ├── 2_deploy.sh │ └── 3_cleanup.sh ├── docker-compose.yml ├── CLAUDE.md └── README.md ``` --- **Version:** 3.0.0 | **Updated:** March 2026 | **Status:** Production