Switch primary model to flash to fix GCP LB 30s timeout

gemini-3.1-pro-preview takes ~25s per call, hitting the GCP load balancer's 30s hard timeout before analysis completes. Flash model returns in ~5-8s, fitting comfortably within the limit. Pro model kept as fallback. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 13:18:24 +00:00 · 2026-03-18 13:18:24 +00:00 · 1de572fcb0
commit 1de572fcb0
parent e85681b775
1 changed files with 2 additions and 2 deletions
--- a/backend/app/services/gemini_service.py
+++ b/backend/app/services/gemini_service.py
@ -38,8 +38,8 @@ class GeminiService:
            api_key=api_key,
            http_options={"timeout": _FALLBACK_TIMEOUT_MS},
        )
-        self.model = "gemini-3.1-pro-preview"
-        self.fallback_model = "gemini-3-flash-preview"
+        self.model = "gemini-3-flash-preview"
+        self.fallback_model = "gemini-3.1-pro-preview"

    async def _generate_content(
        self,