obsidian/wiki/tech-patterns/python-ai-agents.md
2026-04-15 10:48:47 +01:00

99 lines
5.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Python AI Integration Patterns"
aliases: [ai-agents, claude-api, gemini-api, openai, llm-integration]
tags: [ai, claude, gemini, openai, llamaindex, python, agents]
sources: [01 Projects/gmal-scope-builder, 01 Projects/modcomms, 01 Projects/semblance, 01 Projects/enterprise-ai-hub-nexus, 01 Projects/olivas, 01 Projects/pimco-charts, 01 Projects/video-accessibility, 01 Projects/pdf-accessibility, 01 Projects/build-a-squad, 01 Projects/ac-helper]
created: 2026-04-15
updated: 2026-04-15
---
# Python AI Integration Patterns
How Oliver projects integrate LLMs — model selection, structured output, multi-model fallback.
## Key Takeaways
- **Claude** (Anthropic): `tool_use` for structured JSON output — used in GMAL, OliVAS, PIMCO, PDF Accessibility
- **Gemini** (Google): dominant model for real-time/streaming analysis — Mod Comms, Video Accessibility, AC Helper, Build A Squad
- **OpenAI**: used in Semblance (GPT-4.1, GPT-5.2) and Solventum
- **LlamaIndex**: RAG orchestration in Sandbox NotebookLM (multi-model via `llm_factory.py`)
- **DeepGaze**: specialized vision model for OliVAS (saliency/attention prediction)
- Always implement a **fallback model**: Gemini Pro → Flash (Mod Comms pattern)
## When to Use Which Model
| Task | Model | Why |
|------|-------|-----|
| Structured JSON output | Claude (`tool_use`) | Reliable schema adherence |
| Document analysis / image | Gemini 2.5 Pro | Multimodal, fast |
| Real-time persona/focus groups | Gemini 3 Pro Preview / GPT-4.1 | Quality personas |
| Metadata generation | OpenAI (Solventum) | Client preference |
| RAG retrieval | LlamaIndex + any model | Framework abstraction |
| Chart/SVG generation | Claude API | Precise code output |
| Prompt optimization | Claude Sonnet 4.6 | Design analysis |
| Video captions | Gemini 2.5 Pro | Native video understanding |
## Structured Output — Claude `tool_use`
```python
# GMAL pattern: Claude tool_use for schema-bound output
response = client.messages.create(
model="claude-opus-4-6",
tools=[{"name": "parse_brief", "input_schema": BriefSchema}],
tool_choice={"type": "tool", "name": "parse_brief"},
messages=[{"role": "user", "content": brief_text}]
)
result = response.content[0].input # typed dict
```
## Multi-Model Fallback (Mod Comms Pattern)
```python
# Primary: Gemini Pro, fallback: Flash
try:
result = await gemini_pro.generate(prompt)
except Exception:
result = await gemini_flash.generate(prompt)
```
## LLM Factory Pattern (Sandbox NotebookLM)
```python
# llm_factory.py — abstract model selection
def get_llm_by_type(llm_type: str) -> BaseLLM:
mapping = {"gemini-pro": GeminiPro(), "gpt4": OpenAI("gpt-4.1"), ...}
return mapping[llm_type]
def get_structured_llm(schema: type) -> BaseLLM:
... # wraps output in Pydantic model
```
## AI Agents (Mod Comms Multi-Agent)
```
4 specialist agents (Legal, Brand, Tone, Channel)
→ run in parallel
→ Lead agent synthesizes verdict
```
See [[wiki/architecture/multi-agent-ai-systems|multi-agent-ai-systems]] for full pattern.
## Projects Using This Pattern
- [[01 Projects/gmal-scope-builder/GMAL Scope Builder|GMAL Scope Builder]] — Claude Opus 4.6, tool_use, structured ratecard output
- [[01 Projects/modcomms/Mod Comms|Mod Comms]] — Gemini Pro + Flash fallback, 4 parallel agents
- [[01 Projects/semblance/Semblance|Semblance]] — Gemini 3 Pro Preview, GPT-4.1, GPT-5.2, multi-persona
- [[01 Projects/enterprise-ai-hub-nexus/Enterprise AI Hub Nexus|Enterprise Nexus]] — multi-model RAG via LlamaIndex + Qdrant
- [[01 Projects/sandbox-notebookllamalm-nextjs/Sandbox NotebookLM|Sandbox NotebookLM]] — LlamaIndex, llm_factory, 7 studio generators
- [[01 Projects/olivas/OliVAS|OliVAS]] — Claude Sonnet 4.6 design analysis + DeepGaze vision
- [[01 Projects/pimco-charts/PIMCO Charts|PIMCO Charts]] — Claude API, SVG code generation
- [[01 Projects/video-accessibility/Video Accessibility Platform|Video Accessibility]] — Gemini 2.5 Pro, VTT captions, audio description
- [[01 Projects/pdf-accessibility/PDF Accessibility Checker|PDF Accessibility]] — Claude + Google Cloud Vision
- [[01 Projects/build-a-squad/Build A Squad|Build A Squad]] — Gemini (client-side, no backend)
- [[01 Projects/ac-helper/AC Helper|AC Helper]] — Gemini, natural language commands
## Gotchas & Lessons
- Long AI calls (>30s) will timeout on GCP LB — use HTTP polling, not streaming (see [[wiki/architecture/gcp-deployment-lb-timeout|gcp-deployment-lb-timeout]])
- Increase Vite proxy timeout to 5 min for AI-heavy endpoints
- Claude `tool_use` is more reliable than prompt-based JSON for structured output
- `max_tokens` defaults (4096) too low for document generation — set 819216000
- Gemini models change frequently — update model IDs in `llm_factory.py` regularly (Sandbox 2026-03-31)
- GCP Vision + Claude hybrid (PDF Accessibility) — GCV handles image extraction, Claude does semantic analysis
## Related
- [[wiki/architecture/multi-agent-ai-systems|multi-agent-ai-systems]]
- [[wiki/architecture/rag-architecture|rag-architecture]]
- [[wiki/architecture/gcp-deployment-lb-timeout|gcp-deployment-lb-timeout]]