obsidian/wiki/architecture/rag-architecture.md
2026-04-15 10:48:47 +01:00

3.8 KiB

title aliases tags sources created updated
RAG Architecture (Retrieval-Augmented Generation)
rag
vector-search
knowledge-retrieval
qdrant
rag
qdrant
vector-search
llamaindex
firecrawl
architecture
01 Projects/enterprise-ai-hub-nexus
01 Projects/sandbox-notebookllamalm-nextjs
01 Projects/Oliver-ai-bot_2.0
2026-04-15 2026-04-15

RAG Architecture

Knowledge retrieval pipeline: ingest documents → chunk + embed → store in vector DB → retrieve relevant chunks → LLM synthesis.

Key Takeaways

  • Qdrant is the vector DB of choice for Oliver RAG projects
  • AI content structuring before indexing improves retrieval quality (Enterprise Nexus)
  • Firecrawl /v1/crawl handles recursive website crawling for knowledge ingestion
  • Batch processing (10 pages at a time) for document merging avoids context overflow
  • Orphaned vectors accumulate over time — need periodic cleanup
  • LlamaIndex abstracts multi-model RAG (Sandbox NotebookLM)

When to Use

When users need to query a large body of documents in natural language.

Architecture

Enterprise Nexus RAG Pipeline

Sources:
  - File upload (PDF, Word, etc.)
  - SharePoint sync (Celery beat)
  - Website crawl (Firecrawl /v1/crawl)
    ↓
AI content structuring (pre-processing pass)
    ↓
Qdrant (vector store)
    ↓ on query:
Vector similarity search → top-k chunks
    ↓
LLM synthesis (prompt + chunks)
    ↓
Response to user

Sandbox NotebookLM RAG

Document upload → LlamaIndex ingestion
    ↓
llm_factory.get_llm_by_type() → multi-model
    ↓
7 studio generators (flashcards, quiz, mindmap, slides, report, infographic, datatable)
    ↓
ElevenLabs (podcast audio synthesis)

Key Components

Qdrant Operations

# Upsert vectors
client.upsert(collection_name="knowledge", points=[
    PointStruct(id=doc_id, vector=embedding, payload={"text": chunk, "source": filename})
])

# Search
hits = client.search(collection_name="knowledge", query_vector=query_embedding, limit=5)
chunks = [hit.payload["text"] for hit in hits]

# Cleanup orphaned vectors
client.delete(collection_name="knowledge", points_selector=orphaned_ids)

Firecrawl Site Crawl

response = firecrawl.crawl_url(url, params={"crawlOptions": {"maxDepth": 3}})
# Returns list of {url, markdown, metadata}

AI Pre-structuring (Enterprise Nexus)

  • Run LLM pass on raw document before chunking
  • Extracts structured sections, improves chunk boundaries
  • Batch: 10 pages at a time to stay within context window

Projects Using This Pattern

Gotchas & Lessons

  • Qdrant vectors are NOT automatically deleted when source documents are removed — implement orphaned vector cleanup
  • AI pre-structuring significantly improves RAG quality but adds ~2-3s per document at ingest
  • SharePoint sync token refresh must be proactive — Celery beat jobs run after tokens may have expired
  • M365 delegated scopes + offline_access required for refresh tokens (Enterprise Nexus)
  • max_tokens 8192 was too low for from-template PPTX generation in Sandbox — increased to 16000