| title |
aliases |
tags |
sources |
created |
updated |
| RAG Architecture (Retrieval-Augmented Generation) |
| rag |
| vector-search |
| knowledge-retrieval |
| qdrant |
|
| rag |
| qdrant |
| vector-search |
| llamaindex |
| firecrawl |
| architecture |
|
| 01 Projects/enterprise-ai-hub-nexus |
| 01 Projects/sandbox-notebookllamalm-nextjs |
| 01 Projects/Oliver-ai-bot_2.0 |
|
2026-04-15 |
2026-04-15 |
RAG Architecture
Knowledge retrieval pipeline: ingest documents → chunk + embed → store in vector DB → retrieve relevant chunks → LLM synthesis.
Key Takeaways
- Qdrant is the vector DB of choice for Oliver RAG projects
- AI content structuring before indexing improves retrieval quality (Enterprise Nexus)
- Firecrawl
/v1/crawl handles recursive website crawling for knowledge ingestion
- Batch processing (10 pages at a time) for document merging avoids context overflow
- Orphaned vectors accumulate over time — need periodic cleanup
- LlamaIndex abstracts multi-model RAG (Sandbox NotebookLM)
When to Use
When users need to query a large body of documents in natural language.
Architecture
Enterprise Nexus RAG Pipeline
Sources:
- File upload (PDF, Word, etc.)
- SharePoint sync (Celery beat)
- Website crawl (Firecrawl /v1/crawl)
↓
AI content structuring (pre-processing pass)
↓
Qdrant (vector store)
↓ on query:
Vector similarity search → top-k chunks
↓
LLM synthesis (prompt + chunks)
↓
Response to user
Sandbox NotebookLM RAG
Document upload → LlamaIndex ingestion
↓
llm_factory.get_llm_by_type() → multi-model
↓
7 studio generators (flashcards, quiz, mindmap, slides, report, infographic, datatable)
↓
ElevenLabs (podcast audio synthesis)
Key Components
Qdrant Operations
# Upsert vectors
client.upsert(collection_name="knowledge", points=[
PointStruct(id=doc_id, vector=embedding, payload={"text": chunk, "source": filename})
])
# Search
hits = client.search(collection_name="knowledge", query_vector=query_embedding, limit=5)
chunks = [hit.payload["text"] for hit in hits]
# Cleanup orphaned vectors
client.delete(collection_name="knowledge", points_selector=orphaned_ids)
Firecrawl Site Crawl
response = firecrawl.crawl_url(url, params={"crawlOptions": {"maxDepth": 3}})
# Returns list of {url, markdown, metadata}
AI Pre-structuring (Enterprise Nexus)
- Run LLM pass on raw document before chunking
- Extracts structured sections, improves chunk boundaries
- Batch: 10 pages at a time to stay within context window
Projects Using This Pattern
Gotchas & Lessons
- Qdrant vectors are NOT automatically deleted when source documents are removed — implement orphaned vector cleanup
- AI pre-structuring significantly improves RAG quality but adds ~2-3s per document at ingest
- SharePoint sync token refresh must be proactive — Celery beat jobs run after tokens may have expired
- M365 delegated scopes +
offline_access required for refresh tokens (Enterprise Nexus)
max_tokens 8192 was too low for from-template PPTX generation in Sandbox — increased to 16000
Related