fix(deploy): restore original memory limits on ffmpeg/whisper workers

faster_whisper loads its model into RAM at startup regardless of whether
tasks are routed to Cloud Run — reducing the limit to 512M caused OOM kill
on container start. Restored original limits (ffmpeg: 1G, whisper: 2G).

Cloud Run URLs (FFMPEG_SERVICE_URL / WHISPER_SERVICE_URL) remain set so CPU
offload is still active.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Vadym Samoilenko 2026-04-30 14:32:24 +01:00
parent ec1ce5c13a
commit 24d93277de

View file

@ -85,18 +85,18 @@ services:
# ── Pipeline workers — enabled in fallback mode ────────────────────────────
# ffmpeg-worker: CPU-intensive encoding now runs on Cloud Run (ffmpeg-http-service).
# Container is a lightweight HTTP dispatcher — reduced resource limits.
# ffmpeg-worker: CPU-intensive encoding runs on Cloud Run (ffmpeg-http-service).
# Memory limit kept at 1G — local ffmpeg may still run during GCS file staging.
ffmpeg-worker:
deploy:
replicas: 1
resources:
limits:
memory: 256M
cpus: '0.25'
memory: 1G
cpus: '0.5'
reservations:
memory: 128M
cpus: '0.05'
memory: 256M
cpus: '0.1'
environment:
FFMPEG_SERVICE_URL: "https://ffmpeg-http-service-bcb6ipdqka-uc.a.run.app"
@ -111,17 +111,18 @@ services:
memory: 128M
cpus: '0.1'
# whisper-worker: Whisper inference now runs on Cloud Run (whisper-http-service).
# Container is a lightweight HTTP dispatcher — reduced resource limits.
# whisper-worker: Whisper inference runs on Cloud Run (whisper-http-service).
# Memory limit kept at 2G — faster_whisper loads the model into memory at startup
# regardless of whether tasks are routed to Cloud Run.
whisper-worker:
deploy:
replicas: 1
resources:
limits:
memory: 2G
cpus: '0.5'
reservations:
memory: 512M
cpus: '0.25'
reservations:
memory: 256M
cpus: '0.05'
environment:
WHISPER_SERVICE_URL: "https://whisper-http-service-bcb6ipdqka-uc.a.run.app"