vault backup: 2026-04-30 21:42:22

2026-04-30 21:42:22 +01:00 · 2026-04-30 21:42:22 +01:00 · 3c2d661732
commit 3c2d661732
parent 522c794f14
10 changed files with 420 additions and 37 deletions
--- a/Projects/video-accessibility/Next-Session-Prompt.md
+++ b/Projects/video-accessibility/Next-Session-Prompt.md
@ -19,19 +19,19 @@ tags:
 AI SaaS платформа для генерації accessibility-матеріалів (CC, AD, SDH) з відео.
 - **Директорія:** `/Users/ai_leed/Documents/Projects/Oliver/video-accessibility`
 - **Гілка:** `main`
+- **Останній коміт (локальний + pushed):** `3bed598` — fix(glossary+jobs): debug logging + AllJobs filter fix
 - **Сервер:** `optical-dev` (Docker Compose)
- **Останній коміт:** `3bed598` — fix(glossary+jobs): debug logging + AllJobs filter fix

 ---

 ## Що зроблено СЬОГОДНІ (2026-04-30)

-### Виправлені баги (всі в `main`)
+### Виправлені баги — в коді (в `main`, pushed)

 | # | Файл | Проблема | Рішення |
 |---|------|----------|---------|
 | 1 | `tasks/translate_and_synthesize.py` | `UnboundLocalError: job_doc` на рядку ~976 | Перемістив `find_one(job_id)` перед `gcs_path()` |
-| 2 | `migrations/scripts/migration_2026-04-30-000002_fix_status_enum.py` | MongoDB `$jsonSchema` відхиляв статус `cancelled` | Новая міграція з `firstBatch` патерном + повний список статусів |
+| 2 | `migrations/scripts/migration_2026-04-30-000002_fix_status_enum.py` | MongoDB `$jsonSchema` відхиляв статус `cancelled` | Нова міграція з `firstBatch` патерном + повний список статусів |
 | 3 | `migrations/run.py` | Файл не існував | Створив runner з `connect_to_mongo()` |
 | 4 | `services/gemini.py` | Стара модель `gemini-2.5-pro` | Оновлено до `gemini-3.1-pro-preview` |
 | 5 | `core/config.py` + `tasks/tts_synthesis.py` | TTS flash модель застаріла | Flash → `gemini-3.1-flash-tts-preview`, Pro залишився `gemini-2.5-pro-preview-tts` |
@ -39,58 +39,74 @@ AI SaaS платформа для генерації accessibility-матері
 | 7 | `routes/jobs/JobsList.tsx` | All Jobs показував "no jobs" при дефолтних фільтрах | `useEffect` тепер завжди синхронізує `statusFilter` з URL-параметром (очищає, якщо param відсутній) |
 | 8 | `services/glossary_service.py` | Glossary не застосовувалась, помилка ховалась | Детальний дебаг-логінг + guard для `source_term_lower=None` + guard для `target_locale=None` |

-### На сервері виконано вручну
+### Виправлені баги — на сервері вручну (НЕ в коді)

-```bash
-# optical-dev
-docker compose up -d --build tts-worker   # TTS deadlock: 0/15 → фіксовано
-python -m app.migrations.run              # Applied migration_2026-04-30-000002
-```
+| # | Що | Чому |
+|---|-----|------|
+| 9 | `FFMPEG_WORKER_CONCURRENCY=20` → `4` в `.env.production` на optical-dev | OOM crash-loop: 20 prefork × ~120MB = 2.4GB > 1GB container limit. OS OOM killer вбивав процес з ExitCode=0, OOMKilled=False |
+| 10 | `docker compose up -d --build ffmpeg-worker` на сервері | Рестарт з новим concurrency=4; воркер стабілізувався, підхопив 3 задачі з черги |
+| 11 | `docker compose up -d --build tts-worker` | TTS deadlock: 0/15 → фіксовано |
+| 12 | `python -m app.migrations.run` | Applied migration_2026-04-30-000002 |

-### Статус тестового джобу `Test5` (`69f3b6d2cde5f3709e55301e`)
+### Фінальний стан тестового джобу `Test5` (`69f3b6d2cde5f3709e55301e`)

 - TTS: EN ✓ (15/15), DE-DE ✓ (13/13), FR-CA ✓ (15/15)
- Рендер відео: **в процесі** на момент завершення сесії
- Стан: `tts_generating` → очікується перехід у `rendering_qc` або `pending_final_review`
+- Рендер відео: **завершено** — всі 3 `accessible_video.mp4` є в GCS
+- Стан: **`pending_qc`** ✓

 ---

 ## Що НЕ вирішено — TODO на наступну сесію

-### 1. Glossary — причина не знайдена (ПРІОРИТЕТ)
+### 1. API crash-loop — Prometheus port conflict (КРИТИЧНО)

-**Симптом:** Glossary не застосовується навіть коли джоб створений в правильному проекті з активним словником.
+**Симптом:** API контейнер перезапускається. В логах повторюються:
+```
+Failed to start Prometheus server: [Errno 98] Address already in use
+```
+Також повторюються TTS initialization повідомлення — ознака що API стартує знову і знову.

-**Що зробили:** Додали детальний логінг в `get_glossary_block_for_job`. Тепер у воркер-логах буде видно ТОЧНО де повертається `""`.
+**Причина:** Кілька процесів API намагаються прив'язати той самий Prometheus порт. Можливо:
+- Кілька Uvicorn workers (multi-process mode) кожен намагається запустити Prometheus
+- Попередній процес не встиг вивільнити порт до рестарту

-**Наступний крок:**
+**Наступні кроки:**
+```bash
+# На optical-dev:
+docker compose logs api --tail=100 | grep -E "Prometheus|ERROR|Errno"
+docker inspect accessible-video_api_1 | grep -A5 RestartPolicy
+# Перевірити, чи Prometheus запускається тільки в main process:
+grep -r "prometheus" backend/app/ | grep -v ".pyc"
+```
+
+### 2. Glossary — причина не знайдена
+
+**Симптом:** Glossary не застосовується навіть коли джоб в правильному проекті з активним словником.
+
+**Що зробили:** Додали детальний логінг в `get_glossary_block_for_job`. Після деплою логи покажуть ТОЧНО де повертається `""`.
+
+**Наступні кроки:**
 1. Задеплоїти нові зміни (`git pull` + `docker compose up -d --build worker api`)
 2. Запустити новий тест-джоб в проекті зі словником
-3. Перевірити логи воркера: `docker compose logs -f worker | grep -i glossary`
-4. Буде один з варіантів:
+3. Перевірити: `docker compose logs -f worker | grep -i glossary`
+4. Очікувані варіанти:
   - `Glossary skip job=X: no project_id` → джоб не прив'язаний до проекту
-   - `Glossary skip job=X: project ABC not found` → `project_id` не матчить жодного проекту (тип?)
-   - `Glossary skip job=X: no active glossary for client Y` → словник для цього клієнта не активний
-   - `Glossary skip job=X: no source text` → відсутній `_glossary_source_text` (VTT порожній?)
+   - `Glossary skip job=X: project ABC not found` → `project_id` не матчить (тип?)
+   - `Glossary skip job=X: no active glossary for client Y` → словник не активний
+   - `Glossary skip job=X: no source text` → `_glossary_source_text` порожній
   - `Glossary lookup failed ... traceback` → виняток з повним стеком

-### 2. Деплой фронтенду (ПРІОРИТЕТ)
+### 3. Деплой фронтенду (ПРІОРИТЕТ)

-Зміни в `Dashboard.tsx` (Processing counter) і `JobsList.tsx` (AllJobs filter) ще не задеплоєні на сервер.
+Зміни в `Dashboard.tsx` і `JobsList.tsx` ще **не задеплоєні** на сервер.

 ```bash
 # На optical-dev:
 cd /opt/projects/video-accessibility
 git pull
 docker compose up -d --build api
-# або якщо є окремий фронтенд-білд:
-./scripts/build-frontend.sh
 ```

-### 3. Перевірити фінальний стан Test5
-
-Перевірити чи джоб успішно завершив рендер відео і перейшов у `pending_final_review` або `completed`.
-
 ---

 ## Архітектурні нотатки (важливо пам'ятати)
@ -98,17 +114,24 @@ docker compose up -d --build api
 ### 3 окремих Celery-воркери на optical-dev

 ```yaml
-worker:      черги default, ingest, notify, render  (concurrency=8)
-tts-worker:  черга tts ONLY                          (concurrency=10, Cloud Run mode)
-ffmpeg-worker: черга ffmpeg                          (concurrency=20, Cloud Run mode)
+worker:        черги default, ingest, notify, render  (concurrency=8)
+tts-worker:    черга tts ONLY                          (concurrency=10, Cloud Run mode)
+ffmpeg-worker: черга ffmpeg                            (concurrency=4 після фіксу OOM)
 ```

-> [!warning] Ключове правило
-> Якщо `tts-worker` не перебудований — `synthesize_cue_task` зависне в черзі назавжди (0/N cues). Завжди перебудовувати `tts-worker` після змін у `tts_synthesis.py`.
+> [!warning] Ключові правила
+> - Якщо `tts-worker` не перебудований — `synthesize_cue_task` зависне в черзі (0/N cues)
+> - `ffmpeg-worker` concurrency=4 — більше не піднімати без збільшення ліміту пам'яті контейнера
+
+### Чому ffmpeg задачі НЕ йдуть у Cloud Run
+
+`_dispatch_ffmpeg` і `_dispatch_ffprobe` завжди роутять через локальний Celery `ffmpeg` queue, бо freeze-сегменти — локальні файли в `/shared-tmp`, Cloud Run їх не бачить. Cloud Run використовується тільки для: source video duration, video properties, frame extraction, segment re-encoding.
+
+Перевірити: `video_renderer.py` рядки ~233-300 (`_dispatch_ffmpeg`) і ~711-719 (freeze segment duration завжди local).

 ### `USE_CELERY_FALLBACK=true` на optical-dev

-Коли це встановлено, завдання йдуть у локальний Celery замість Cloud Run. Це потрібно для дебагу.
+Коли встановлено — всі завдання йдуть у локальний Celery. `FFMPEG_SERVICE_URL` контролює тільки **окремі** ffmpeg виклики, не весь роутинг задач.

 ### MongoDB `$jsonSchema` validator

@ -173,3 +196,9 @@ backend/app/migrations/run.py                   (new file)
 frontend/src/routes/Dashboard.tsx               (Processing counter fix)
 frontend/src/routes/jobs/JobsList.tsx           (AllJobs filter fix)
 ```
+
+**На сервері (тільки `.env.production`, не в git):**
+```
+/opt/projects/video-accessibility/.env.production
+  FFMPEG_WORKER_CONCURRENCY=20 → 4
+```
--- a/Daily/2026-04-30.md
+++ b/Daily/2026-04-30.md
@ -842,3 +842,6 @@ tags: [daily]
 - 21:33 | `video-accessibility`
  - **Asked:** Debugged production queue API 403 error and memory issues | Reduced FFMPEG_WORKER_CONCURRENCY from 20 to 4 to prevent OOM-kill, allowing worker to process all 3 queues and concatenate segments | Worker configuration
  - **Done:** FFMPEG worker memory optimization | FFMPEG_WORKER_CONCURRENCY parameter reduced, worker queue processing enabled | Environment configuration, Worker service
+- 21:40 (2min) | `video-accessibility`
+  - **Asked:** Debug 403 Forbidden error on production queue stats API endpoint.
+  - **Done:** Identified authentication issue with GET request to /video-accessibility/api/v1/admin/production/queue-stats endpoint.
--- a/wiki/_master-index.md
+++ b/wiki/_master-index.md
@ -23,8 +23,8 @@ This 3-hop pattern works for hundreds of articles without vector search.
 | [[wiki/tech-patterns/_index\|tech-patterns/]] | Recurring tech stacks: FastAPI, React/Vite, Next.js, Azure AD, AI, Box, One2Edit, Redis/Celery, cost-tracker | 17 |
 | [[wiki/architecture/_index\|architecture/]] | Cross-cutting architectural patterns: Docker Compose, multi-agent AI, GCP timeout, RAG, hotfolder, optical-dev deploy, cost-tracker, new-project checklist, troubleshooting playbooks, ADR log, Cloud Run Jobs | 11 |
 | [[wiki/client-knowledge/_index\|client-knowledge/]] | Per-client notes for Ford, H&M, L'Oréal, Barclays, Ferrero, 3M | 6 |
-| [[wiki/concepts/_index\|concepts/]] | Atomic knowledge extracted from Claude Code sessions | 86 |
-| [[wiki/connections/_index\|connections/]] | Cross-cutting insights linking 2+ concepts: FastAPI+Azure AD+Docker trinity, AI→cost-tracker, Apache+Vite basePath, GCP→REST polling, Box+hotfolder, Docker DNS+AdGuard | 9 |
+| [[wiki/concepts/_index\|concepts/]] | Atomic knowledge extracted from Claude Code sessions | 89 |
+| [[wiki/connections/_index\|connections/]] | Cross-cutting insights linking 2+ concepts: FastAPI+Azure AD+Docker trinity, AI→cost-tracker, Apache+Vite basePath, GCP→REST polling, Box+hotfolder, Docker DNS+AdGuard, Celery prefork×faster_whisper memory stacking | 10 |
 | [[wiki/qa/_index\|qa/]] | Filed answers to queries (saved with `--file-back`) | 0 |
 | [[wiki/homelab/_index\|homelab/]] | Self-hosted infra: Proxmox install, IOMMU/PCI passthrough, hypervisor setup, budget builds, HP Elitedesk G3, Homarr API + Apps + Boards + Certificates + Integrations + Settings + Tasks + AdGuard + Clock + Docker Stats + Docker Integration + Download Client + Firewall + Proxmox Integration + Radarr + Readarr + Sonarr + Bookmarks + Calendar + Icons + App Widget + Weather + GitHub + Nextcloud + qBittorrent + RSS Feed + Speedtest Tracker + System Health Monitoring + System Resources + Services Map + Media Stack | 42 |
 | [[wiki/web-agency/_index\|web-agency/]] | AI-assisted website building & selling: Claude Code, Nanobanana 2, Kling, LaunchPath MCP | 9 |
--- a/wiki/concepts/_index.md
+++ b/wiki/concepts/_index.md
@ -100,5 +100,9 @@
 | [[wiki/concepts/celery-queue-worker-specialization]] | Named Celery queues: only the container consuming that queue processes tasks — fix bugs in specialised workers by rebuilding THAT container | daily/2026-04-30.md | 2026-04-30 |
 | [[wiki/concepts/gcs-resumable-upload-pattern]] | Browser → backend creates GCS Resumable Session URI → browser uploads chunks directly to GCS, bypassing LB/Apache; 8 MB chunks, 308=continue, resume via Range header | daily/2026-04-30.md | 2026-04-30 |

+| [[wiki/concepts/celery-prefork-pool-startup-memory]] | Celery prefork forks ALL CONCURRENCY workers at startup — CONCURRENCY=20 × 120 MB = 2.4 GB before first task; OOM before any work | daily/2026-04-30.md | 2026-04-30 |
+| [[wiki/concepts/sudo-git-clone-root-ownership]] | `sudo git clone` makes all files root-owned — subsequent user `git pull` fails with Permission denied on .git/FETCH_HEAD; fix: chown -R | daily/2026-04-30.md | 2026-04-30 |
+| [[wiki/concepts/python-fastapi-module-level-singletons]] | `settings = Settings()` at module import level crashes pytest when env vars aren't set — guard with `@lru_cache` function or lazy `@property` | daily/2026-04-30.md | 2026-04-30 |
+
 <!-- Articles added automatically by compile.py -->
 <!-- Format: | [[concepts/slug]] | One-line summary | daily/YYYY-MM-DD.md | date | -->
--- a/wiki/concepts/celery-prefork-pool-startup-memory.md
+++ b/wiki/concepts/celery-prefork-pool-startup-memory.md
@ -0,0 +1,82 @@
+---
+title: "Celery Prefork Pool — All Workers Fork at Startup"
+aliases:
+  - celery-prefork-startup-memory
+  - celery-concurrency-oom
+tags:
+  - celery
+  - python
+  - docker
+  - memory
+  - worker
+sources:
+  - "daily/2026-04-30.md"
+created: 2026-04-30
+updated: 2026-04-30
+---
+
+# Celery Prefork Pool — All Workers Fork at Startup
+
+Celery's default `prefork` pool forks **all** `CONCURRENCY` worker processes immediately at startup, not lazily on first task. Each forked process loads the full Python interpreter plus all imports. With `CONCURRENCY=20` and 120 MB per process, that is 2.4 GB of RAM consumed before a single task is processed — enough to OOM-kill a container and stall a pipeline for 15+ minutes while the cause is invisible in application logs.
+
+## Key Points
+
+- `prefork` (the default pool type) forks N processes at `celery worker` start time
+- Each process is a full Python interpreter with all imports loaded
+- Memory = `CONCURRENCY × per_worker_MB` consumed before any task runs
+- OOM manifests as the container being killed, not a Python exception
+- `CONCURRENCY=1` is safe but eliminates parallelism — tune with the formula below
+
+## Details
+
+### Safe Concurrency Formula
+
+```
+CONCURRENCY = floor(container_memory_MB / per_worker_MB)
+```
+
+Measure `per_worker_MB` with:
+
+```bash
+# Start one worker, check RSS
+celery -A app worker --concurrency=1 &
+ps aux | grep celery
+```
+
+Common baselines (no heavy ML models):
+- Pure Python FastAPI worker: ~60–80 MB
+- Worker that imports `faster_whisper`: ~400–800 MB per worker (model loaded per process)
+- Worker that imports `torch`: 300–500 MB baseline
+
+### Alternative Pool Types
+
+| Pool | Startup behaviour | Use case |
+|------|------------------|----------|
+| `prefork` (default) | All N processes fork immediately | CPU-bound tasks |
+| `solo` | Single-process, no fork | Dev / low-memory containers |
+| `gevent` / `eventlet` | Green threads, shared process | I/O-bound tasks |
+| `threads` | OS threads, shared process | I/O-bound, simpler than gevent |
+
+Switch via `CELERY_POOL=solo` or `--pool=gevent`.
+
+### Stacking with ML Libraries
+
+If a worker imports a model library at module level (e.g. `faster_whisper`, `torch`), that model is loaded into **every** forked process. With `CONCURRENCY=4` and a 400 MB model, startup RAM = 1.6 GB before any inference runs. See the connection article.
+
+### Symptoms
+
+- Container killed within 10–30 seconds of `docker compose up`
+- No Python traceback — OOM killer logs in `dmesg` / `docker events`
+- `docker stats` shows memory spike to container limit then drop (restart)
+- Tasks never start processing; queue builds up
+
+## Related Concepts
+
+- [[wiki/concepts/faster-whisper-startup-memory]] — model loads at startup in each worker process
+- [[wiki/connections/celery-prefork-faster-whisper-memory-stacking]] — the combined effect when both apply
+- [[wiki/concepts/docker-compose-cpu-limits-env]] — memory limits in Compose override files
+- [[wiki/concepts/celery-queue-worker-specialization]] — specialised workers, smaller CONCURRENCY per service
+
+## Sources
+
+- [[daily/2026-04-30.md]] — Session 21:37, ffmpeg-worker OOM diagnosis; CONCURRENCY=20, 2.4 GB pre-task RAM, 15-minute pipeline stall
--- a/wiki/concepts/python-fastapi-module-level-singletons.md
+++ b/wiki/concepts/python-fastapi-module-level-singletons.md
@ -0,0 +1,97 @@
+---
+title: "Module-Level Singletons Break pytest — Use Lazy Initialisation"
+aliases:
+  - module-level-settings-pytest
+  - lazy-singleton-fastapi
+  - settings-import-time-instantiation
+tags:
+  - python
+  - fastapi
+  - pytest
+  - testing
+  - pydantic
+sources:
+  - "daily/2026-04-30.md"
+created: 2026-04-30
+updated: 2026-04-30
+---
+
+# Module-Level Singletons Break pytest — Use Lazy Initialisation
+
+Instantiating `Settings()`, `SomeService()`, or any object that reads environment variables at **module import time** causes pytest to fail when those env vars are not set in the test environment — even for tests that never call that module's functions. Python imports all referenced modules on `import`, so `settings = Settings()` at the top of `config.py` runs as soon as any test file imports anything from that package.
+
+## Key Points
+
+- `Settings()` at module level runs at `import` time, not at call time
+- pytest imports modules eagerly — a test for `routes/health.py` may trigger `config.py` → `Settings()` → `ValidationError`
+- The failure looks like a config error, not a test design problem
+- Fix: wrap in `@lru_cache` function or `@property` so instantiation is deferred to first use
+- Pydantic `BaseSettings` validation runs in `__init__` — there is no "lazy" mode
+
+## Details
+
+### Anti-Pattern
+
+```python
+# config.py  ← runs at import time
+settings = Settings()          # crashes if MONGO_URL not set in test env
+
+# service.py
+db_service = DatabaseService() # same problem
+```
+
+### Fix 1 — `@lru_cache` function (recommended for FastAPI)
+
+```python
+from functools import lru_cache
+
+@lru_cache
+def get_settings() -> Settings:
+    return Settings()
+
+# Use as FastAPI dependency
+@router.get("/")
+async def handler(settings: Settings = Depends(get_settings)):
+    ...
+```
+
+Tests can override with `app.dependency_overrides[get_settings] = lambda: FakeSettings()`.
+
+### Fix 2 — `@property` on a config holder
+
+```python
+class _Config:
+    _settings: Settings | None = None
+
+    @property
+    def settings(self) -> Settings:
+        if self._settings is None:
+            self._settings = Settings()
+        return self._settings
+
+config = _Config()  # safe — no Settings() call yet
+```
+
+### Fix 3 — pytest `monkeypatch` / `.env` file
+
+For tests that genuinely need the real Settings, provide env vars via a `conftest.py`:
+
+```python
+@pytest.fixture(autouse=True)
+def env_vars(monkeypatch):
+    monkeypatch.setenv("MONGO_URL", "mongodb://localhost:27017/test")
+    monkeypatch.setenv("SECRET_KEY", "test-secret")
+```
+
+### Why Python 3.14 Makes This Worse
+
+Python 3.14 has no pre-built wheels for Rust-extension packages (`pydantic-core`, `cryptography`). Poetry silently installs a pure-Python fallback that may behave differently or be missing functionality. Always pin `python = "^3.11"` in `pyproject.toml` and run tests in Docker matching the production Python version.
+
+## Related Concepts
+
+- [[wiki/concepts/poetry-docker-version-mismatch]] — Poetry / Python version mismatch causing silent failures
+- [[wiki/concepts/time-sleep-blocks-asyncio]] — another class of import-time footgun in async FastAPI
+
+## Sources
+
+- [[daily/2026-04-30.md]] — Session 13:36, test suite fixes; module-level Settings() crashes, aiohttp mock pattern
--- a/wiki/concepts/sudo-git-clone-root-ownership.md
+++ b/wiki/concepts/sudo-git-clone-root-ownership.md
@ -0,0 +1,74 @@
+---
+title: "sudo git clone Makes Files Root-Owned — User git Pull Fails"
+aliases:
+  - sudo-git-clone-root-files
+  - git-permission-denied-fetch-head
+tags:
+  - git
+  - linux
+  - server
+  - permissions
+sources:
+  - "daily/2026-04-30.md"
+created: 2026-04-30
+updated: 2026-04-30
+---
+
+# sudo git clone Makes Files Root-Owned — User git Pull Fails
+
+Running `sudo git clone` on a server creates every file and directory — including the entire `.git/` folder — owned by `root`. Any subsequent `git pull` or `git fetch` run as a regular user fails with `Permission denied` on `.git/FETCH_HEAD` (or similar index files), even though the user can read the working tree.
+
+## Key Points
+
+- `sudo git clone` → all files owned by `root:root`
+- `git pull` as a non-root user hits a write permission error on `.git/FETCH_HEAD`
+- The error message looks like a network or credential issue but is purely a filesystem ownership problem
+- Fix: `sudo chown -R $USER:$USER /opt/project`
+- Prevention: never use `sudo` for `git clone` unless the repo must be root-owned
+
+## Details
+
+### The Error
+
+```
+error: cannot open .git/FETCH_HEAD: Permission denied
+```
+
+or
+
+```
+fatal: Unable to create '/opt/project/.git/index.lock': Permission denied
+```
+
+### Fix
+
+```bash
+sudo chown -R $USER:$USER /opt/project
+# Verify
+ls -la /opt/project/.git/
+```
+
+### Prevention
+
+If deploying to `/opt/` or `/srv/` (root-owned dirs), create the directory first, then clone as the service user:
+
+```bash
+sudo mkdir -p /opt/project
+sudo chown deploy:deploy /opt/project
+git clone git@github.com:org/project.git /opt/project
+```
+
+Or use `sudo -u deploy git clone ...` to clone as the deploy user directly.
+
+### Why This Happens
+
+`sudo` switches the effective UID to root. `git clone` creates all files with the current effective UID as owner. There is no `--chown` flag on `git clone`, unlike `docker cp`.
+
+## Related Concepts
+
+- [[wiki/concepts/monorepo-deploy-script-pitfall]] — another class of silent git failure during deploys
+- [[wiki/concepts/python-service-deployment-dotenv]] — deploy checklist for Python services
+
+## Sources
+
+- [[daily/2026-04-30.md]] — Session 12:11, re-deploy after project folder deletion; sudo git clone footgun discovered
--- a/wiki/connections/_index.md
+++ b/wiki/connections/_index.md
@ -15,5 +15,7 @@
 | [[wiki/connections/box-api-hotfolder-pattern]] | Box API ↔ hotfolder daemon — always paired; archive pattern prevents double-processing | 2026-04-27 | 2026-04-27 |
 | [[wiki/connections/docker-dns-adguard-split-horizon]] | Docker DNS ↔ AdGuard split-horizon — Docker containers inherit router DNS, not AdGuard; explicit dns: config required | daily/2026-04-28.md | 2026-04-28 |

+| [[wiki/connections/celery-prefork-faster-whisper-memory-stacking]] | Celery prefork fork-all ↔ faster_whisper model-at-startup — CONCURRENCY × model_size GB consumed before first task | daily/2026-04-30.md | 2026-04-30 |
+
 <!-- Articles added automatically by compile.py -->
 <!-- Format: | [[connections/slug]] | ConceptA ↔ ConceptB | daily/YYYY-MM-DD.md | date | -->
--- a/wiki/connections/celery-prefork-faster-whisper-memory-stacking.md
+++ b/wiki/connections/celery-prefork-faster-whisper-memory-stacking.md
@ -0,0 +1,86 @@
+---
+title: "Connection: Celery Prefork × faster_whisper — Memory Stacking"
+connects:
+  - "concepts/celery-prefork-pool-startup-memory"
+  - "concepts/faster-whisper-startup-memory"
+sources:
+  - "daily/2026-04-30.md"
+created: 2026-04-30
+updated: 2026-04-30
+---
+
+# Connection: Celery Prefork × faster_whisper — Memory Stacking
+
+## The Connection
+
+Two independent startup-memory behaviours combine multiplicatively when `faster_whisper` is imported inside a Celery worker module:
+
+1. **Celery prefork** forks ALL `CONCURRENCY` worker processes at `celery worker` start — each is a full Python interpreter with all imports loaded.
+2. **faster_whisper** loads the full transcription model into RAM at import time (when `WhisperModel(...)` is called at module level or in a module-level `@worker_init` signal handler).
+
+Result: `CONCURRENCY=4` with a 400 MB Whisper model = **1.6 GB** consumed before the first transcription task is dequeued.
+
+## Key Insight
+
+> Neither behaviour is a bug in isolation — the danger is invisible until they are combined in the same container.
+
+The `faster-whisper-startup-memory` article documents the per-container model loading cost. The `celery-prefork-pool-startup-memory` article documents the per-worker process forking cost. When they stack, the formula becomes:
+
+```
+total_startup_RAM = CONCURRENCY × (base_worker_MB + model_size_MB)
+```
+
+Example with `large-v3` model (~1.5 GB) and `CONCURRENCY=4`:
+
+```
+4 × (80 MB interpreter + 1500 MB model) = 6.3 GB before first task
+```
+
+A container with a 4 GB memory limit is OOM-killed before it processes anything.
+
+## Evidence
+
+- Session 21:37 (2026-04-30): ffmpeg-worker with `CONCURRENCY=20`, ~120 MB/process → 2.4 GB, container OOM-killed, 15-minute pipeline stall
+- The stall was compounded because Celery silently retries tasks that were in-flight when the worker died, creating a second wave of OOM on restart
+
+## Solutions
+
+### Option A — Reduce concurrency to match model size
+
+```
+CONCURRENCY = floor(container_memory_MB / (base_MB + model_MB))
+```
+
+### Option B — Separate transcription into its own single-worker container
+
+Keep `CONCURRENCY=1` for the whisper worker, scale by adding containers, not by increasing CONCURRENCY. Each container has exactly one model copy.
+
+### Option C — Load model lazily (inside the task, not at import)
+
+```python
+_model = None
+
+@app.task
+def transcribe(audio_path: str):
+    global _model
+    if _model is None:
+        _model = WhisperModel("large-v3")
+    return _model.transcribe(audio_path)
+```
+
+Downside: first task in each process pays the load latency (~5–15 s). Subsequent tasks in the same process reuse the loaded model.
+
+### Option D — Use `solo` or `threads` pool
+
+`CELERY_POOL=solo` runs tasks in the main process with no forking — only one model copy regardless of logical concurrency. Appropriate for GPU workers where parallelism is handled at the GPU level.
+
+## Related Concepts
+
+- [[wiki/concepts/celery-prefork-pool-startup-memory]] — Celery fork-all-at-startup behaviour
+- [[wiki/concepts/faster-whisper-startup-memory]] — model loaded at container start
+- [[wiki/concepts/celery-queue-worker-specialization]] — isolating whisper work to dedicated containers
+- [[wiki/concepts/docker-compose-cpu-limits-env]] — setting memory limits in Compose
+
+## Sources
+
+- [[daily/2026-04-30.md]] — Session 21:37, Celery ffmpeg-worker OOM; identified as combined prefork + model-loading issue
--- a/wiki/log.md
+++ b/wiki/log.md
@ -1,6 +1,12 @@

 # Build Log

+## [2026-04-30T23:30:00+01:00] compile | 2026-04-30.md (pass 2)
+- Source: daily/2026-04-30.md
+- Articles created: [[wiki/concepts/celery-prefork-pool-startup-memory]], [[wiki/concepts/sudo-git-clone-root-ownership]], [[wiki/concepts/python-fastapi-module-level-singletons]], [[wiki/connections/celery-prefork-faster-whisper-memory-stacking]]
+- Articles updated: (none)
+- Index updates: [[wiki/concepts/_index]] (86→89); [[wiki/connections/_index]] (9→10); [[wiki/_master-index]] (concepts 86→89, connections 9→10)
+
 ## [2026-04-30T21:00:00+01:00] compile | 2026-04-30.md
 - Source: daily/2026-04-30.md
 - Articles created: [[wiki/concepts/pydub-ffmpeg-silent-dependency]], [[wiki/concepts/lameenc-bytearray-gcs-upload]], [[wiki/concepts/apache-mod-alias-proxy-priority]], [[wiki/concepts/faster-whisper-startup-memory]], [[wiki/concepts/celery-redis-queue-flush-on-deterministic-error]], [[wiki/concepts/cline-lm-studio-openai-compatible]], [[wiki/concepts/celery-queue-worker-specialization]], [[wiki/concepts/gcs-resumable-upload-pattern]]