Switch primary model to flash to fix GCP LB 30s timeout

gemini-3.1-pro-preview takes ~25s per call, hitting the GCP load
balancer's 30s hard timeout before analysis completes. Flash model
returns in ~5-8s, fitting comfortably within the limit. Pro model
kept as fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Vadym Samoilenko 2026-03-18 13:18:24 +00:00
parent e85681b775
commit 1de572fcb0

View file

@ -38,8 +38,8 @@ class GeminiService:
api_key=api_key,
http_options={"timeout": _FALLBACK_TIMEOUT_MS},
)
self.model = "gemini-3.1-pro-preview"
self.fallback_model = "gemini-3-flash-preview"
self.model = "gemini-3-flash-preview"
self.fallback_model = "gemini-3.1-pro-preview"
async def _generate_content(
self,