| title |
aliases |
tags |
sources |
created |
updated |
| Multi-Agent AI Systems |
| multi-agent |
| ai-agents |
| parallel-agents |
|
| ai |
| multi-agent |
| architecture |
| gemini |
| gpt |
|
| 01 Projects/modcomms |
| 01 Projects/semblance |
| 01 Projects/enterprise-ai-hub-nexus |
|
2026-04-15 |
2026-04-15 |
Multi-Agent AI Systems
Pattern where multiple specialized AI agents run in parallel, with a lead/orchestrator agent synthesizing results.
Key Takeaways
- Parallel specialist agents + lead synthesizer = reliable, multi-perspective analysis
- HTTP polling (not WebSocket) for delivering async agent results on GCP
- Each agent should have a focused, single-concern prompt (Legal, Brand, Tone, Channel)
- Autonomous orchestration needs explicit "next speaker" logic to prevent infinite loops
- Background task execution (
ai_runner_service.py) keeps agents non-blocking
When to Use
- Content review requiring multiple compliance dimensions (legal + brand + tone)
- Focus group simulation (multi-persona conversations)
- Any task benefiting from multiple independent perspectives before synthesis
Architecture Patterns
Pattern 1: Parallel Specialists + Lead (Mod Comms)
Input (proof image/PDF)
↓
┌──────────────────────────────────┐
│ Agent 1: Legal compliance │
│ Agent 2: Brand adherence │ ← run in PARALLEL
│ Agent 3: Tone of voice │
│ Agent 4: Channel suitability │
└──────────────────────────────────┘
↓
Lead Agent: synthesize verdict
↓
Result (via HTTP polling)
Pattern 2: Autonomous Multi-Persona (Semblance)
Input (discussion brief)
↓
Persona generator (Gemini) → N personas
↓
Conversation controller
├── conversation_decision_service.py ← next speaker logic
├── conversation_context_service.py ← shared state + history
└── ai_runner_service.py ← background task execution
↓
Socket.IO → frontend (real-time)
↓
Theme extraction + analytics
Pattern 3: RAG + Structured AI (Enterprise Nexus)
Document upload → Firecrawl crawl
↓
AI content structuring (pre-indexing)
↓
Qdrant vector DB (10-page batch merge)
↓
Query → vector search → LLM synthesis
Projects Using This Pattern
Gotchas & Lessons
- GCP 30s LB timeout kills streaming delivery — always use HTTP polling for agent results (see wiki/architecture/gcp-deployment-lb-timeout)
- Semblance: naive vs aware datetime crash — always use timezone-aware datetimes in async contexts
- "Next speaker" logic in autonomous mode must have termination conditions to prevent infinite loops
- Cross-loop WebSocket emit in Semblance was unreliable — polling fallback was more stable
- Orphaned vectors in Qdrant need periodic cleanup (Enterprise Nexus has a "Purge orphaned vectors" button)
Related