obsidian/wiki/architecture/multi-agent-ai-systems.md
2026-04-15 10:48:47 +01:00

3.6 KiB

title aliases tags sources created updated
Multi-Agent AI Systems
multi-agent
ai-agents
parallel-agents
ai
multi-agent
architecture
gemini
gpt
01 Projects/modcomms
01 Projects/semblance
01 Projects/enterprise-ai-hub-nexus
2026-04-15 2026-04-15

Multi-Agent AI Systems

Pattern where multiple specialized AI agents run in parallel, with a lead/orchestrator agent synthesizing results.

Key Takeaways

  • Parallel specialist agents + lead synthesizer = reliable, multi-perspective analysis
  • HTTP polling (not WebSocket) for delivering async agent results on GCP
  • Each agent should have a focused, single-concern prompt (Legal, Brand, Tone, Channel)
  • Autonomous orchestration needs explicit "next speaker" logic to prevent infinite loops
  • Background task execution (ai_runner_service.py) keeps agents non-blocking

When to Use

  • Content review requiring multiple compliance dimensions (legal + brand + tone)
  • Focus group simulation (multi-persona conversations)
  • Any task benefiting from multiple independent perspectives before synthesis

Architecture Patterns

Pattern 1: Parallel Specialists + Lead (Mod Comms)

Input (proof image/PDF)
    ↓
┌──────────────────────────────────┐
│ Agent 1: Legal compliance        │
│ Agent 2: Brand adherence         │  ← run in PARALLEL
│ Agent 3: Tone of voice           │
│ Agent 4: Channel suitability     │
└──────────────────────────────────┘
    ↓
Lead Agent: synthesize verdict
    ↓
Result (via HTTP polling)

Pattern 2: Autonomous Multi-Persona (Semblance)

Input (discussion brief)
    ↓
Persona generator (Gemini) → N personas
    ↓
Conversation controller
    ├── conversation_decision_service.py  ← next speaker logic
    ├── conversation_context_service.py   ← shared state + history
    └── ai_runner_service.py              ← background task execution
    ↓
Socket.IO → frontend (real-time)
    ↓
Theme extraction + analytics

Pattern 3: RAG + Structured AI (Enterprise Nexus)

Document upload → Firecrawl crawl
    ↓
AI content structuring (pre-indexing)
    ↓
Qdrant vector DB (10-page batch merge)
    ↓
Query → vector search → LLM synthesis

Projects Using This Pattern

Gotchas & Lessons

  • GCP 30s LB timeout kills streaming delivery — always use HTTP polling for agent results (see wiki/architecture/gcp-deployment-lb-timeout)
  • Semblance: naive vs aware datetime crash — always use timezone-aware datetimes in async contexts
  • "Next speaker" logic in autonomous mode must have termination conditions to prevent infinite loops
  • Cross-loop WebSocket emit in Semblance was unreliable — polling fallback was more stable
  • Orphaned vectors in Qdrant need periodic cleanup (Enterprise Nexus has a "Purge orphaned vectors" button)