obsidian/wiki/llm-models/gemini-model-catalog.md
2026-05-09 17:44:30 +01:00

5.9 KiB

title aliases tags sources created updated
Gemini Model Catalog
google-gemini-models
gemini-api-models
llm
google
gemini
api
models
raw/Gemini API Google AI for Developers.md
2026-05-08 2026-05-08

Overview

Google's Gemini API model lineup spans text, audio, image, video, music, embeddings, and robotics. Models are grouped by generation (2.5, 3.x) and tier (Pro > Flash > Flash-Lite). All available via Google AI for Developers.


Gemini 2.5 Family (Current Stable)

Model Tier Status Best For
gemini-2.5-pro Pro Stable Complex tasks, deep reasoning, coding
gemini-2.5-flash Flash Stable Price-performance, low-latency, high-volume
gemini-2.5-flash-lite Flash-Lite Stable Fastest + cheapest multimodal in 2.5 family
Gemini 2.5 Flash Live Flash Preview Real-time conversational agents, sub-second audio
Gemini 2.5 Flash TTS Flash Preview Controllable text-to-speech, fine style/pacing control
Gemini 2.5 Pro TTS Pro Preview High-fidelity TTS for podcasts, audiobooks
Imagen 4 (Nano Banana) Flash Stable Native image gen/editing, fast creative workflows

Gemini 3 Family (Preview / Upcoming)

Model Status Best For
gemini-3.1-pro-preview Preview Advanced reasoning, agentic, vibe coding
gemini-3-flash-preview Preview Frontier-class perf at lower cost
gemini-3.1-flash-lite Stable Frontier-class, budget-friendly
gemini-3.1-flash-live-preview Preview Real-time dialogue, voice-first AI
gemini-3.1-flash-tts-preview Preview Low-latency speech generation
Nano Banana 2 (image) Preview High-efficiency production image gen + editing
Nano Banana Pro (image) Preview Studio-quality 4K, complex layouts, text rendering

Note: "Nano Banana" appears to be Google's display alias for Imagen-series image generation models (Imagen 4 family).


Audio Models

Model Latency Use Case
Gemini 3.1 Flash Live Low A2A real-time dialogue, voice-first apps
Gemini 3.1 Flash TTS Low TTS with expressive audio tags
Gemini 2.5 Flash Live Low Bidirectional voice + video agents, native audio reasoning
Gemini 2.5 Flash TTS Low Cost-efficient real-time TTS
Gemini 2.5 Pro TTS High fidelity Structured workflows (podcasts, audiobooks)

Generative Media Models

Model Type Status
Nano Banana 2 Image gen/edit Preview
Nano Banana Pro Image gen/edit Preview
Nano Banana (2.5) Image gen/edit Stable
Imagen 4 Text-to-image (up to 2K) Stable
Veo 3.1 Video + synced audio Preview
Veo 3.1 Lite Low-cost video gen/edit Preview

Music Generation Models

Model Use Case
Lyria 3 Pro Full-length songs, structural coherence
Lyria 3 Clip Short clips, loops, ≤30 sec previews
Lyria RealTime Experimental Granular control, real-time streaming

Tool & Agent Models

Model Capability
Computer Use Preview Screen vision + UI actions (click, type, navigate) — browser automation
Gemini Deep Research Preview Agentic multi-step research across hundreds of sources, cited reports
Gemini Deep Research Max Preview Maximum comprehensiveness version of Deep Research

Specialized Task Models

Model Type Notes
Gemini Embedding 2 Multimodal embedding Text + image + video + audio + PDF → unified vector space
Gemini Embedding Text embedding High-dimensional vectors for semantic search, RAG
Gemini Robotics-ER 1.6 Embodied reasoning Physical space understanding, multi-step robotic tasks

Model Version Naming Conventions

Channel Example ID Behavior
Stable gemini-2.5-flash Fixed version, rarely changes. Recommended for production
Preview gemini-2.5-flash-preview-09-2025 Production-eligible, billing enabled, ≥2 weeks deprecation notice
Latest gemini-flash-latest Auto-updates to newest release; 2-week email notice before swap
Experimental gemini-*-exp-* Not for production, restricted rate limits, may disappear

Deprecated / Shut Down

Model Status Notes
Gemini 2.0 Flash Deprecated 1M context, native tool use
Gemini 2.0 Flash-Lite Deprecated Fastest gen-2
Gemini 3 Pro Preview Shut down

Key Takeaways

  • 2.5 Flash is the current go-to for cost/performance balance; 2.5 Pro for complex reasoning
  • Gemini 3.x is in preview — 3.1 Flash-Lite is already stable, rest require preview access
  • Gemini has dedicated audio Live models for real-time voice agents (A2A streaming)
  • "Nano Banana" = Google's display name for Imagen-series image models in the API console
  • Computer Use model can automate browser UIs natively — similar to Anthropic's computer use
  • Gemini Embedding 2 is multimodal — handles text, images, video, audio, PDF in one embedding space
  • Use gemini-flash-latest alias with caution — it hot-swaps on new releases with only 2-week notice
  • Experimental models have no stability guarantees and restrictive rate limits


Sources