RAG Architecture

Knowledge retrieval pipeline: ingest documents → chunk + embed → store in vector DB → retrieve relevant chunks → LLM synthesis.

Key Takeaways

Qdrant is the vector DB of choice for Oliver RAG projects
AI content structuring before indexing improves retrieval quality (Enterprise Nexus)
Firecrawl /v1/crawl handles recursive website crawling for knowledge ingestion
Batch processing (10 pages at a time) for document merging avoids context overflow
Orphaned vectors accumulate over time — need periodic cleanup
LlamaIndex abstracts multi-model RAG (Sandbox NotebookLM)

When to Use

When users need to query a large body of documents in natural language.

Architecture

Enterprise Nexus RAG Pipeline

Sources:
  - File upload (PDF, Word, etc.)
  - SharePoint sync (Celery beat)
  - Website crawl (Firecrawl /v1/crawl)
    ↓
AI content structuring (pre-processing pass)
    ↓
Qdrant (vector store)
    ↓ on query:
Vector similarity search → top-k chunks
    ↓
LLM synthesis (prompt + chunks)
    ↓
Response to user

Sandbox NotebookLM RAG

Document upload → LlamaIndex ingestion
    ↓
llm_factory.get_llm_by_type() → multi-model
    ↓
7 studio generators (flashcards, quiz, mindmap, slides, report, infographic, datatable)
    ↓
ElevenLabs (podcast audio synthesis)

Key Components

Qdrant Operations

# Upsert vectors
client.upsert(collection_name="knowledge", points=[
    PointStruct(id=doc_id, vector=embedding, payload={"text": chunk, "source": filename})
])

# Search
hits = client.search(collection_name="knowledge", query_vector=query_embedding, limit=5)
chunks = [hit.payload["text"] for hit in hits]

# Cleanup orphaned vectors
client.delete(collection_name="knowledge", points_selector=orphaned_ids)

Firecrawl Site Crawl

response = firecrawl.crawl_url(url, params={"crawlOptions": {"maxDepth": 3}})
# Returns list of {url, markdown, metadata}

AI Pre-structuring (Enterprise Nexus)

Run LLM pass on raw document before chunking
Extracts structured sections, improves chunk boundaries
Batch: 10 pages at a time to stay within context window

Projects Using This Pattern

01 Projects/enterprise-ai-hub-nexus/Enterprise AI Hub Nexus — Qdrant + Firecrawl + AI pre-structuring + SharePoint sync + Celery
01 Projects/sandbox-notebookllamalm-nextjs/Sandbox NotebookLM — LlamaIndex multi-model + 7 studio generators + ElevenLabs podcast
01 Projects/Oliver-ai-bot_2.0/Oliver AI Bot 2.0 — RAG mode (85% complete)

Gotchas & Lessons

Qdrant vectors are NOT automatically deleted when source documents are removed — implement orphaned vector cleanup
AI pre-structuring significantly improves RAG quality but adds ~2-3s per document at ingest
SharePoint sync token refresh must be proactive — Celery beat jobs run after tokens may have expired
M365 delegated scopes + offline_access required for refresh tokens (Enterprise Nexus)
max_tokens 8192 was too low for from-template PPTX generation in Sandbox — increased to 16000

wiki/tech-patterns/redis-celery-worker-queue — SharePoint sync scheduling
wiki/tech-patterns/azure-ad-msal-auth — M365 auth for SharePoint
wiki/tech-patterns/python-ai-agents — LLM integration details
wiki/architecture/multi-agent-ai-systems — related AI pattern

3.8 KiB Raw Blame History

RAG Architecture

Key Takeaways

When to Use

Architecture

Enterprise Nexus RAG Pipeline

Sandbox NotebookLM RAG

Key Components

Qdrant Operations

Firecrawl Site Crawl

AI Pre-structuring (Enterprise Nexus)

Projects Using This Pattern

Gotchas & Lessons

Related

3.8 KiB

Raw Blame History