No description

Find a file

DJP ddc52b9922 Add extensive debug logging to pipeline query response		2025-10-02 10:51:14 -04:00
.github/workflows	Renaming to NotebookLlama	2025-06-30 22:31:22 +02:00
data/test	Refactor + workflow	2025-06-28 19:48:54 +02:00
OLD-readme	Add TRANSFORMATION.md - complete feature comparison and stats	2025-10-02 09:37:18 -04:00
src	Add extensive debug logging to pipeline query response	2025-10-02 10:51:14 -04:00
tests	Renaming to NotebookLlama	2025-06-30 22:31:22 +02:00
tools	Renaming to NotebookLlama	2025-06-30 22:31:22 +02:00
.DS_Store	Add TRANSFORMATION.md - complete feature comparison and stats	2025-10-02 09:37:18 -04:00
.gitignore	UI + final touches	2025-06-28 22:07:44 +02:00
.pre-commit-config.yaml	Renaming to NotebookLlama	2025-06-30 22:31:22 +02:00
.python-version	first commit	2025-06-27 22:47:48 +02:00
compose.yaml	Adding observability dashboard	2025-06-29 12:01:17 +02:00
CONTRIBUTING.md	first commit	2025-06-27 22:47:48 +02:00
LICENSE	Adding document chat and moving to a multi-page app	2025-06-28 23:25:13 +02:00
pyproject.toml	Add GPT-5 and Claude Sonnet 4.5 support - 4 AI models total	2025-10-02 09:56:06 -04:00
README.md	Add comprehensive README and fix notebook sharing bug	2025-10-01 17:34:34 -04:00
server.log	Fix button colors and document critical data leakage issue	2025-10-01 19:20:32 -04:00
start.sh	Transform NotebookLlaMa to enterprise multi-user NotebookLM clone	2025-10-01 17:28:06 -04:00
TRANSFORMATION.md	Add TRANSFORMATION.md - complete feature comparison and stats	2025-10-02 09:37:18 -04:00
uv.lock	Add GPT-5 and Claude Sonnet 4.5 support - 4 AI models total	2025-10-02 09:56:06 -04:00
watch_server.sh	Transform NotebookLlaMa to enterprise multi-user NotebookLM clone	2025-10-01 17:28:06 -04:00

README.md

🦙 NotebookLlaMa - Enterprise Multi-User NotebookLM Clone

A production-ready, open-source alternative to Google's NotebookLM with multi-user support, document collections, AI-powered chat, and podcast generation.

🌟 Features

Core Capabilities

📓 Multi-Document Notebooks - Organize 1-100+ PDFs into collections
💬 Intelligent Chat - Ask questions across ALL documents in a notebook
🎙️ Podcast Generation - AI-generated audio conversations from your content
🤝 Team Collaboration - Share notebooks with colleagues
🔐 Enterprise Security - User authentication, data isolation, access controls
📊 Observability - Full tracing with Jaeger and OpenTelemetry

What Makes This Special

Notebook-First Design - Documents are organized into collections (like Google NotebookLM)
Multi-Document Intelligence - Chat queries search across ALL documents simultaneously
Source Attribution - See which document each answer came from
True Multi-Tenancy - Complete data isolation between users
Production Ready - Database-backed, scalable architecture

🔧 Prerequisites

Required Software

Docker Desktop
- Download: https://www.docker.com/products/docker-desktop
- Version: Latest stable release
- Purpose: Runs PostgreSQL database and monitoring tools
Python 3.13+
- Check version: python3 --version
- Download: https://www.python.org/downloads/
- Purpose: Application runtime

uv Package Manager

Install on macOS/Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

Install on Windows:

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Purpose: Fast Python package management

Required API Keys

You'll need accounts and API keys from:

OpenAI (for GPT-4 chat and responses)
- Sign up: https://platform.openai.com/signup
- Get API key: https://platform.openai.com/api-keys
- Pricing: ~$0.03 per 1K tokens
LlamaCloud (for document parsing and indexing)
- Sign up: https://cloud.llamaindex.ai
- Get API key: Dashboard → Settings → API Keys
- Free tier available
ElevenLabs (for podcast voice generation)
- Sign up: https://elevenlabs.io
- Get API key: Settings → API Keys
- Free tier: 10,000 characters/month

📦 Installation

Step 1: Clone the Repository

git clone https://bitbucket.org/zlalani/sandbox-notebookllamalm.git notebookllama
cd notebookllama

Step 2: Install Python Dependencies

# Install all dependencies
uv sync

This installs:

Streamlit (web UI)
SQLAlchemy (database ORM)
LlamaIndex (AI workflows)
OpenAI, ElevenLabs clients
PostgreSQL driver
And 25+ other packages

Step 3: Start Docker Services

# Start PostgreSQL, Jaeger, and Adminer
docker compose up -d

This starts:

PostgreSQL on port 5432 (database)
Jaeger on port 16686 (tracing UI)
Adminer on port 8080 (database admin)

Verify Docker is running:

docker ps

You should see 3 containers: instrumentation-postgres-1, instrumentation-jaeger-1, instrumentation-adminer-1

⚙️ Configuration

Step 4: Set Up Environment Variables

Create your .env file:

# Copy example if it exists, or create new
touch .env

Edit .env with your favorite editor and add:

# ===== API Keys =====
OPENAI_API_KEY="sk-your-openai-api-key-here"
LLAMACLOUD_API_KEY="llx-your-llamacloud-api-key-here"
ELEVENLABS_API_KEY="sk_your-elevenlabs-api-key-here"

# ===== Database Configuration =====
pgql_db=postgres
pgql_user=postgres
pgql_psw=admin

# ===== LlamaCloud IDs (will be generated) =====
EXTRACT_AGENT_ID=""
LLAMACLOUD_PIPELINE_ID=""

Important:

Do NOT use quotes around database credentials
DO use quotes around API keys
Keep pgql_psw=admin (matches Docker setup)

Step 5: Create LlamaCloud Resources

Run these scripts to set up LlamaCloud extraction and indexing:

# Create extraction agent
uv run tools/create_llama_extract_agent.py

This will output an EXTRACT_AGENT_ID. Copy it to your .env file.

# Create indexing pipeline
uv run tools/create_llama_cloud_index.py

This will output a LLAMACLOUD_PIPELINE_ID. Copy it to your .env file.

Your .env should now look like:

OPENAI_API_KEY="sk-..."
LLAMACLOUD_API_KEY="llx-..."
ELEVENLABS_API_KEY="sk_..."
pgql_db=postgres
pgql_user=postgres
pgql_psw=admin
EXTRACT_AGENT_ID="cb7cdd30-81ea-4917-acd6-3bb505149289"
LLAMACLOUD_PIPELINE_ID="884e242c-86dd-4824-8347-e6dfb91d98dc"

🎬 First-Time Setup

Step 6: Stop Any Conflicting Services

Important: Stop local PostgreSQL if you have it installed:

# macOS (Homebrew)
brew services stop postgresql@14
brew services stop postgresql@15
killall postgres

# Linux (systemd)
sudo systemctl stop postgresql

# Windows
# Stop PostgreSQL service from Services panel

Why? Local PostgreSQL conflicts with the Docker PostgreSQL on port 5432.

Step 7: Initialize the Database

# Create all database tables
uv run src/notebookllama/init_database.py

You should see:

✓ Database connection successful
✓ Database tables created successfully

Tables created:
  - users
  - documents
  - notebooks
  - document_summaries
  - notebook_documents
  - chat_sessions
  - chat_messages
  - document_shares

If you see errors, check:

Docker is running: docker ps
PostgreSQL container is healthy: docker logs instrumentation-postgres-1
No local PostgreSQL is running: lsof -i :5432 (should only show Docker)

Step 8: Run Database Migration

# Migrate schema to notebook-first architecture
echo "yes" | uv run src/notebookllama/migrate_to_notebooks.py

This sets up the multi-document notebook structure.

🚀 Running the Application

Quick Start (Recommended)

# Use the automated startup script
./start.sh

This script:

Starts Docker services
Checks database is initialized
Stops conflicting PostgreSQL
Starts MCP server
Launches Streamlit app

Manual Start (More Control)

Terminal 1: Start MCP Server

uv run src/notebookllama/server.py

Keep this running. You should see:

INFO Starting MCP server 'MCP For NotebookLM'...
INFO Uvicorn running on http://127.0.0.1:8000

Terminal 2: Start Streamlit App

streamlit run src/notebookllama/App.py

You should see:

You can now view your Streamlit app in your browser.
Local URL: http://localhost:8501

Verify Everything is Running

# Check all services
docker ps                    # Should show 3 containers
lsof -i :8000               # MCP server
lsof -i :8501               # Streamlit app
lsof -i :5432               # PostgreSQL (Docker only!)

👤 User Guide

First-Time User Setup

Open your browser to http://localhost:8501
Create an account:
- Click "Sign Up" tab
- Enter email (any format, doesn't need to be real)
- Enter username (unique)
- Enter password (minimum 8 characters)
- Click "Sign Up"
You're in! You'll see the dashboard.

Creating Your First Notebook

Workflow:

Create Notebook → Upload PDFs → AI Processes → View Summary → Chat → Generate Podcast → Share

Step-by-Step:

1. Create Notebook

Click "Create New Notebook" (green button)
Name: "Q4 Marketing Analysis"
Description: "All marketing reports and research for Q4 2024"
Upload documents now? Upload 2-5 PDFs
Click "Create Notebook"

2. Wait for Processing

Each document takes 30-60 seconds
Progress bar shows status
Documents are processed sequentially

3. View Your Notebook

Automatically redirected to notebook detail
See all uploaded documents
View combined summaries from ALL documents
Browse highlights and Q&A

4. Chat with Your Notebook

Click "💬 Chat with Notebook"
Ask: "What are the main themes across all documents?"
AI searches ALL documents in your notebook
Responses show which document each answer came from

5. Generate a Podcast

In Notebook Detail, click "🎙️ Generate Podcast"
Click "Generate Now"
Wait 3-5 minutes
Listen to AI-generated 10-15 minute discussion
Download and share

6. Share with Team

Click "📤 Share Notebook"
Enter colleague's email
Choose permission: Read or Write
Click "Share"
They receive access to entire notebook

Adding More Documents Later

Open any notebook
Click "➕ Add Documents"
Upload more PDFs
They're processed and added to the collection
Summaries and chat are automatically updated

Managing Notebooks

Edit: Change name/description
Delete: Removes notebook and all data
Remove Document: Take a document out of notebook
View Shared: See notebooks others shared with you

🏗️ Architecture

Application Structure

NotebookLlaMa/
├── src/notebookllama/
│   ├── App.py                      # Main entry point (dashboard)
│   ├── auth.py                     # Authentication system
│   ├── database.py                 # SQLAlchemy ORM models
│   ├── notebook_manager.py         # Notebook CRUD operations
│   ├── document_manager.py         # Document CRUD operations
│   ├── workflow.py                 # LlamaIndex workflow
│   ├── utils.py                    # LlamaCloud API calls
│   ├── audio.py                    # Podcast generation
│   ├── server.py                   # MCP server (for chat)
│   ├── init_database.py            # Database initialization
│   ├── migrate_to_notebooks.py     # Database migration
│   └── pages/
│       ├── 1_My_Notebooks.py       # List and create notebooks
│       ├── 2_Notebook_Detail.py    # View/manage notebook
│       ├── 3_Notebook_Chat.py      # Chat interface
│       ├── 4_Shared_Notebooks.py   # Shared notebooks view
│       └── 5_Observability_Dashboard.py  # Performance monitoring
├── compose.yaml                    # Docker services configuration
├── pyproject.toml                  # Python dependencies
├── start.sh                        # Automated startup script
└── Documentation/
    ├── README.md                   # This file
    ├── ENTERPRISE_SETUP.md         # Detailed setup guide
    ├── SIMPLIFIED_PLAN.md          # Architecture overview
    ├── IMPLEMENTATION_SUMMARY.md   # Technical details
    └── CURRENT_STATUS.md           # Known issues

Technology Stack

Frontend: Streamlit (Python web framework)
Backend: FastMCP, LlamaIndex
Database: PostgreSQL with SQLAlchemy ORM
AI Services:
- OpenAI GPT-4 (chat, structured responses)
- LlamaCloud (document parsing, extraction, indexing)
- ElevenLabs (text-to-speech for podcasts)
Observability: Jaeger, OpenTelemetry
Authentication: bcrypt password hashing

Database Schema

users
├── notebooks (collections of documents)
│   ├── notebook_documents (junction table)
│   │   └── documents (PDF files)
│   │       └── document_summaries (AI analysis)
│   ├── chat_sessions
│   │   └── chat_messages
│   └── document_shares (sharing permissions)

7 Tables Total:

users - User accounts
notebooks - Document collections
documents - Uploaded PDFs
notebook_documents - Links documents to notebooks
document_summaries - AI-generated summaries, Q&A, highlights
chat_sessions - Conversation sessions
chat_messages - Individual messages
document_shares - Sharing and permissions

Data Flow

User uploads PDF
    ↓
Document saved to database
    ↓
Sent to LlamaCloud for parsing
    ↓
Sent to LlamaExtract for analysis
    ↓
Summary/Q&A/Highlights generated
    ↓
Saved to document_summaries table
    ↓
Added to LlamaCloud index
    ↓
Available for chat

🐛 Troubleshooting

Common Issues

1. Database Connection Failed

Symptom: role "postgres" does not exist or connection errors

Solution:

# Stop local PostgreSQL
brew services stop postgresql@14
killall postgres

# Restart Docker with fresh database
docker compose down -v
docker compose up -d

# Wait 5 seconds, then reinitialize
sleep 5
uv run src/notebookllama/init_database.py
echo "yes" | uv run src/notebookllama/migrate_to_notebooks.py

2. MCP Server Not Responding

Symptom: Chat doesn't work, 500 errors

Solution:

# Check if server is running
lsof -i :8000

# If not running or crashed:
killall python
uv run src/notebookllama/server.py > server.log 2>&1 &

# Check logs
tail -f server.log

3. Document Processing Fails

Symptom: "Error processing document" or uploads fail

Check:

# Verify API keys are set
grep OPENAI_API_KEY .env
grep LLAMACLOUD_API_KEY .env
grep ELEVENLABS_API_KEY .env

# Verify LlamaCloud IDs exist
grep EXTRACT_AGENT_ID .env
grep LLAMACLOUD_PIPELINE_ID .env

# Re-run LlamaCloud setup if needed
uv run tools/create_llama_extract_agent.py
uv run tools/create_llama_cloud_index.py

4. Port Already in Use

Symptom: "Address already in use" errors

Solution:

# Port 5432 (PostgreSQL)
lsof -i :5432
killall postgres  # Kill local postgres

# Port 8000 (MCP Server)
lsof -i :8000
kill -9 <PID>

# Port 8501 (Streamlit)
lsof -i :8501
kill -9 <PID>

5. Summaries Not Saving

Symptom: Documents show "Processing... Summary not yet available"

Cause: MCP server crashed during processing (known bug in MCP library)

Solution:

The newest code bypasses MCP for document processing
Summaries should now save reliably
If old documents have no summaries, re-upload them

6. Import Errors

Symptom: cannot import name 'core' from 'llama_index'

Solution:

# Reinstall/upgrade packages
uv sync --reinstall

🚢 Deployment

For Production Use

Environment Setup

Use Managed PostgreSQL
- AWS RDS, Google Cloud SQL, or Azure Database
- Enable automated backups
- Set up read replicas for scaling
Environment Variables
- Use secrets management (AWS Secrets Manager, etc.)
- Never commit API keys to git
- Rotate keys regularly
Security Hardening
- Enable HTTPS/TLS
- Set up firewall rules
- Implement rate limiting
- Add CSRF protection
- Set session timeout (30 minutes recommended)

Scaling Considerations

Single Server (< 50 users):

# Run everything on one machine
docker compose up -d
uv run src/notebookllama/server.py &
streamlit run src/notebookllama/App.py

Multi-Server (50-1000 users):

Load balancer (nginx/HAProxy)
Multiple Streamlit instances
Separate MCP server instances
Managed PostgreSQL
Redis for session caching

Enterprise (1000+ users):

Kubernetes deployment
Auto-scaling groups
CDN for static assets
Separate database for each service
Message queue (RabbitMQ/Kafka) for async tasks
Dedicated job workers for document processing

Monitoring

# Access Jaeger UI
http://localhost:16686

# Access database admin
http://localhost:8080

Set up alerts for:

API rate limits
Database connection pool exhaustion
Disk space (for podcasts and uploads)
Processing failures

📊 Usage Metrics

Performance

Document Processing: 30-60 seconds per PDF
Chat Response: 3-5 seconds per query
Podcast Generation: 3-5 minutes for 10-minute audio
Page Load: < 1 second with caching

Resource Requirements

Minimum:

4 GB RAM
2 CPU cores
20 GB disk space

Recommended:

8 GB RAM
4 CPU cores
100 GB disk space (for documents and podcasts)

API Usage Estimates

Per Document (assuming 20-page PDF):

LlamaCloud: ~$0.10 (parsing + extraction)
OpenAI: ~$0.50 (summary + Q&A generation)
Total: ~$0.60 per document

Per Podcast (10 minutes):

OpenAI: ~$0.20 (conversation script)
ElevenLabs: ~$0.30 (voice generation)
Total: ~$0.50 per podcast

Per Chat Message:

OpenAI: ~$0.01 per query

🔒 Security

Current Implementation

✅ Password hashing with bcrypt (salt rounds: 12)
✅ SQL injection protection via SQLAlchemy ORM
✅ Session-based authentication
✅ Per-user data isolation
✅ Document access controls
✅ Granular sharing permissions

Recommended Additions for Production

HTTPS/TLS encryption
Rate limiting per user
Input validation and sanitization
CSRF tokens
Session expiration (30-60 minutes)
Two-factor authentication (2FA)
Audit logging
IP allowlisting
API key rotation

📚 Documentation Files

README.md (this file) - Main documentation
ENTERPRISE_SETUP.md - Detailed enterprise features guide
SIMPLIFIED_PLAN.md - Architecture and design decisions
IMPLEMENTATION_SUMMARY.md - Technical implementation details
CURRENT_STATUS.md - Known issues and limitations
FINAL_README.md - Quick reference guide

🤝 Contributing

We welcome contributions! Areas for improvement:

Notebook-level synthesis (consolidate summaries across docs)
Cross-document Q&A generation
Podcast length controls
Background job queue for processing
Advanced search across all notebooks
Export functionality (PDF, Word)
Mobile-responsive UI
API access with tokens

📝 License

MIT License - See LICENSE file for details

🙏 Acknowledgments

Built with:

LlamaIndex - AI application framework
LlamaCloud - Document processing
Streamlit - Web interface
OpenAI - Language models
ElevenLabs - Voice synthesis
PostgreSQL - Database
Jaeger - Distributed tracing

Original project: run-llama/notebookllama

📞 Support

Issues: Create an issue in the repository
Questions: Check documentation files first
Bugs: Include error logs and steps to reproduce

🎓 Quick Reference

Useful Commands

# Start everything
./start.sh

# Restart MCP server
killall python && uv run src/notebookllama/server.py &

# Check database
PGPASSWORD=admin psql -h localhost -U postgres -d postgres -c "SELECT * FROM notebooks;"

# View logs
tail -f server.log

# Reset database (CAUTION: deletes all data!)
docker compose down -v
docker compose up -d
uv run src/notebookllama/init_database.py
echo "yes" | uv run src/notebookllama/migrate_to_notebooks.py

Service URLs

Streamlit App: http://localhost:8501
MCP Server: http://localhost:8000/mcp/
Jaeger Tracing: http://localhost:16686
Database Admin: http://localhost:8080
- System: PostgreSQL
- Server: instrumentation-postgres-1
- Username: postgres
- Password: admin
- Database: postgres

Made with ❤️ using Claude Code

Last Updated: October 1, 2025

README.md Unescape Escape

🦙 NotebookLlaMa - Enterprise Multi-User NotebookLM Clone

🌟 Features

Core Capabilities

What Makes This Special

📋 Table of Contents

🔧 Prerequisites

Required Software

Required API Keys

📦 Installation

Step 1: Clone the Repository

Step 2: Install Python Dependencies

Step 3: Start Docker Services

⚙️ Configuration

Step 4: Set Up Environment Variables

Step 5: Create LlamaCloud Resources

🎬 First-Time Setup

Step 6: Stop Any Conflicting Services

Step 7: Initialize the Database

Step 8: Run Database Migration

🚀 Running the Application

Quick Start (Recommended)

Manual Start (More Control)

Verify Everything is Running

👤 User Guide

First-Time User Setup

Creating Your First Notebook

Workflow:

Step-by-Step:

Adding More Documents Later

Managing Notebooks

🏗️ Architecture

Application Structure

Technology Stack

Database Schema

Data Flow

🐛 Troubleshooting

Common Issues

1. Database Connection Failed

2. MCP Server Not Responding

3. Document Processing Fails

4. Port Already in Use

5. Summaries Not Saving

6. Import Errors

🚢 Deployment

For Production Use

Environment Setup

Scaling Considerations

Monitoring

📊 Usage Metrics

Performance

Resource Requirements

API Usage Estimates

🔒 Security

Current Implementation

Recommended Additions for Production

📚 Documentation Files

🤝 Contributing

📝 License

🙏 Acknowledgments

📞 Support

🎓 Quick Reference

Useful Commands

Service URLs

README.md