vault backup: 2026-04-30 21:42:22
This commit is contained in:
parent
522c794f14
commit
3c2d661732
10 changed files with 420 additions and 37 deletions
|
|
@ -19,19 +19,19 @@ tags:
|
|||
AI SaaS платформа для генерації accessibility-матеріалів (CC, AD, SDH) з відео.
|
||||
- **Директорія:** `/Users/ai_leed/Documents/Projects/Oliver/video-accessibility`
|
||||
- **Гілка:** `main`
|
||||
- **Останній коміт (локальний + pushed):** `3bed598` — fix(glossary+jobs): debug logging + AllJobs filter fix
|
||||
- **Сервер:** `optical-dev` (Docker Compose)
|
||||
- **Останній коміт:** `3bed598` — fix(glossary+jobs): debug logging + AllJobs filter fix
|
||||
|
||||
---
|
||||
|
||||
## Що зроблено СЬОГОДНІ (2026-04-30)
|
||||
|
||||
### Виправлені баги (всі в `main`)
|
||||
### Виправлені баги — в коді (в `main`, pushed)
|
||||
|
||||
| # | Файл | Проблема | Рішення |
|
||||
|---|------|----------|---------|
|
||||
| 1 | `tasks/translate_and_synthesize.py` | `UnboundLocalError: job_doc` на рядку ~976 | Перемістив `find_one(job_id)` перед `gcs_path()` |
|
||||
| 2 | `migrations/scripts/migration_2026-04-30-000002_fix_status_enum.py` | MongoDB `$jsonSchema` відхиляв статус `cancelled` | Новая міграція з `firstBatch` патерном + повний список статусів |
|
||||
| 2 | `migrations/scripts/migration_2026-04-30-000002_fix_status_enum.py` | MongoDB `$jsonSchema` відхиляв статус `cancelled` | Нова міграція з `firstBatch` патерном + повний список статусів |
|
||||
| 3 | `migrations/run.py` | Файл не існував | Створив runner з `connect_to_mongo()` |
|
||||
| 4 | `services/gemini.py` | Стара модель `gemini-2.5-pro` | Оновлено до `gemini-3.1-pro-preview` |
|
||||
| 5 | `core/config.py` + `tasks/tts_synthesis.py` | TTS flash модель застаріла | Flash → `gemini-3.1-flash-tts-preview`, Pro залишився `gemini-2.5-pro-preview-tts` |
|
||||
|
|
@ -39,58 +39,74 @@ AI SaaS платформа для генерації accessibility-матері
|
|||
| 7 | `routes/jobs/JobsList.tsx` | All Jobs показував "no jobs" при дефолтних фільтрах | `useEffect` тепер завжди синхронізує `statusFilter` з URL-параметром (очищає, якщо param відсутній) |
|
||||
| 8 | `services/glossary_service.py` | Glossary не застосовувалась, помилка ховалась | Детальний дебаг-логінг + guard для `source_term_lower=None` + guard для `target_locale=None` |
|
||||
|
||||
### На сервері виконано вручну
|
||||
### Виправлені баги — на сервері вручну (НЕ в коді)
|
||||
|
||||
```bash
|
||||
# optical-dev
|
||||
docker compose up -d --build tts-worker # TTS deadlock: 0/15 → фіксовано
|
||||
python -m app.migrations.run # Applied migration_2026-04-30-000002
|
||||
```
|
||||
| # | Що | Чому |
|
||||
|---|-----|------|
|
||||
| 9 | `FFMPEG_WORKER_CONCURRENCY=20` → `4` в `.env.production` на optical-dev | OOM crash-loop: 20 prefork × ~120MB = 2.4GB > 1GB container limit. OS OOM killer вбивав процес з ExitCode=0, OOMKilled=False |
|
||||
| 10 | `docker compose up -d --build ffmpeg-worker` на сервері | Рестарт з новим concurrency=4; воркер стабілізувався, підхопив 3 задачі з черги |
|
||||
| 11 | `docker compose up -d --build tts-worker` | TTS deadlock: 0/15 → фіксовано |
|
||||
| 12 | `python -m app.migrations.run` | Applied migration_2026-04-30-000002 |
|
||||
|
||||
### Статус тестового джобу `Test5` (`69f3b6d2cde5f3709e55301e`)
|
||||
### Фінальний стан тестового джобу `Test5` (`69f3b6d2cde5f3709e55301e`)
|
||||
|
||||
- TTS: EN ✓ (15/15), DE-DE ✓ (13/13), FR-CA ✓ (15/15)
|
||||
- Рендер відео: **в процесі** на момент завершення сесії
|
||||
- Стан: `tts_generating` → очікується перехід у `rendering_qc` або `pending_final_review`
|
||||
- Рендер відео: **завершено** — всі 3 `accessible_video.mp4` є в GCS
|
||||
- Стан: **`pending_qc`** ✓
|
||||
|
||||
---
|
||||
|
||||
## Що НЕ вирішено — TODO на наступну сесію
|
||||
|
||||
### 1. Glossary — причина не знайдена (ПРІОРИТЕТ)
|
||||
### 1. API crash-loop — Prometheus port conflict (КРИТИЧНО)
|
||||
|
||||
**Симптом:** Glossary не застосовується навіть коли джоб створений в правильному проекті з активним словником.
|
||||
**Симптом:** API контейнер перезапускається. В логах повторюються:
|
||||
```
|
||||
Failed to start Prometheus server: [Errno 98] Address already in use
|
||||
```
|
||||
Також повторюються TTS initialization повідомлення — ознака що API стартує знову і знову.
|
||||
|
||||
**Що зробили:** Додали детальний логінг в `get_glossary_block_for_job`. Тепер у воркер-логах буде видно ТОЧНО де повертається `""`.
|
||||
**Причина:** Кілька процесів API намагаються прив'язати той самий Prometheus порт. Можливо:
|
||||
- Кілька Uvicorn workers (multi-process mode) кожен намагається запустити Prometheus
|
||||
- Попередній процес не встиг вивільнити порт до рестарту
|
||||
|
||||
**Наступний крок:**
|
||||
**Наступні кроки:**
|
||||
```bash
|
||||
# На optical-dev:
|
||||
docker compose logs api --tail=100 | grep -E "Prometheus|ERROR|Errno"
|
||||
docker inspect accessible-video_api_1 | grep -A5 RestartPolicy
|
||||
# Перевірити, чи Prometheus запускається тільки в main process:
|
||||
grep -r "prometheus" backend/app/ | grep -v ".pyc"
|
||||
```
|
||||
|
||||
### 2. Glossary — причина не знайдена
|
||||
|
||||
**Симптом:** Glossary не застосовується навіть коли джоб в правильному проекті з активним словником.
|
||||
|
||||
**Що зробили:** Додали детальний логінг в `get_glossary_block_for_job`. Після деплою логи покажуть ТОЧНО де повертається `""`.
|
||||
|
||||
**Наступні кроки:**
|
||||
1. Задеплоїти нові зміни (`git pull` + `docker compose up -d --build worker api`)
|
||||
2. Запустити новий тест-джоб в проекті зі словником
|
||||
3. Перевірити логи воркера: `docker compose logs -f worker | grep -i glossary`
|
||||
4. Буде один з варіантів:
|
||||
3. Перевірити: `docker compose logs -f worker | grep -i glossary`
|
||||
4. Очікувані варіанти:
|
||||
- `Glossary skip job=X: no project_id` → джоб не прив'язаний до проекту
|
||||
- `Glossary skip job=X: project ABC not found` → `project_id` не матчить жодного проекту (тип?)
|
||||
- `Glossary skip job=X: no active glossary for client Y` → словник для цього клієнта не активний
|
||||
- `Glossary skip job=X: no source text` → відсутній `_glossary_source_text` (VTT порожній?)
|
||||
- `Glossary skip job=X: project ABC not found` → `project_id` не матчить (тип?)
|
||||
- `Glossary skip job=X: no active glossary for client Y` → словник не активний
|
||||
- `Glossary skip job=X: no source text` → `_glossary_source_text` порожній
|
||||
- `Glossary lookup failed ... traceback` → виняток з повним стеком
|
||||
|
||||
### 2. Деплой фронтенду (ПРІОРИТЕТ)
|
||||
### 3. Деплой фронтенду (ПРІОРИТЕТ)
|
||||
|
||||
Зміни в `Dashboard.tsx` (Processing counter) і `JobsList.tsx` (AllJobs filter) ще не задеплоєні на сервер.
|
||||
Зміни в `Dashboard.tsx` і `JobsList.tsx` ще **не задеплоєні** на сервер.
|
||||
|
||||
```bash
|
||||
# На optical-dev:
|
||||
cd /opt/projects/video-accessibility
|
||||
git pull
|
||||
docker compose up -d --build api
|
||||
# або якщо є окремий фронтенд-білд:
|
||||
./scripts/build-frontend.sh
|
||||
```
|
||||
|
||||
### 3. Перевірити фінальний стан Test5
|
||||
|
||||
Перевірити чи джоб успішно завершив рендер відео і перейшов у `pending_final_review` або `completed`.
|
||||
|
||||
---
|
||||
|
||||
## Архітектурні нотатки (важливо пам'ятати)
|
||||
|
|
@ -98,17 +114,24 @@ docker compose up -d --build api
|
|||
### 3 окремих Celery-воркери на optical-dev
|
||||
|
||||
```yaml
|
||||
worker: черги default, ingest, notify, render (concurrency=8)
|
||||
tts-worker: черга tts ONLY (concurrency=10, Cloud Run mode)
|
||||
ffmpeg-worker: черга ffmpeg (concurrency=20, Cloud Run mode)
|
||||
worker: черги default, ingest, notify, render (concurrency=8)
|
||||
tts-worker: черга tts ONLY (concurrency=10, Cloud Run mode)
|
||||
ffmpeg-worker: черга ffmpeg (concurrency=4 після фіксу OOM)
|
||||
```
|
||||
|
||||
> [!warning] Ключове правило
|
||||
> Якщо `tts-worker` не перебудований — `synthesize_cue_task` зависне в черзі назавжди (0/N cues). Завжди перебудовувати `tts-worker` після змін у `tts_synthesis.py`.
|
||||
> [!warning] Ключові правила
|
||||
> - Якщо `tts-worker` не перебудований — `synthesize_cue_task` зависне в черзі (0/N cues)
|
||||
> - `ffmpeg-worker` concurrency=4 — більше не піднімати без збільшення ліміту пам'яті контейнера
|
||||
|
||||
### Чому ffmpeg задачі НЕ йдуть у Cloud Run
|
||||
|
||||
`_dispatch_ffmpeg` і `_dispatch_ffprobe` завжди роутять через локальний Celery `ffmpeg` queue, бо freeze-сегменти — локальні файли в `/shared-tmp`, Cloud Run їх не бачить. Cloud Run використовується тільки для: source video duration, video properties, frame extraction, segment re-encoding.
|
||||
|
||||
Перевірити: `video_renderer.py` рядки ~233-300 (`_dispatch_ffmpeg`) і ~711-719 (freeze segment duration завжди local).
|
||||
|
||||
### `USE_CELERY_FALLBACK=true` на optical-dev
|
||||
|
||||
Коли це встановлено, завдання йдуть у локальний Celery замість Cloud Run. Це потрібно для дебагу.
|
||||
Коли встановлено — всі завдання йдуть у локальний Celery. `FFMPEG_SERVICE_URL` контролює тільки **окремі** ffmpeg виклики, не весь роутинг задач.
|
||||
|
||||
### MongoDB `$jsonSchema` validator
|
||||
|
||||
|
|
@ -173,3 +196,9 @@ backend/app/migrations/run.py (new file)
|
|||
frontend/src/routes/Dashboard.tsx (Processing counter fix)
|
||||
frontend/src/routes/jobs/JobsList.tsx (AllJobs filter fix)
|
||||
```
|
||||
|
||||
**На сервері (тільки `.env.production`, не в git):**
|
||||
```
|
||||
/opt/projects/video-accessibility/.env.production
|
||||
FFMPEG_WORKER_CONCURRENCY=20 → 4
|
||||
```
|
||||
|
|
|
|||
|
|
@ -842,3 +842,6 @@ tags: [daily]
|
|||
- 21:33 | `video-accessibility`
|
||||
- **Asked:** Debugged production queue API 403 error and memory issues | Reduced FFMPEG_WORKER_CONCURRENCY from 20 to 4 to prevent OOM-kill, allowing worker to process all 3 queues and concatenate segments | Worker configuration
|
||||
- **Done:** FFMPEG worker memory optimization | FFMPEG_WORKER_CONCURRENCY parameter reduced, worker queue processing enabled | Environment configuration, Worker service
|
||||
- 21:40 (2min) | `video-accessibility`
|
||||
- **Asked:** Debug 403 Forbidden error on production queue stats API endpoint.
|
||||
- **Done:** Identified authentication issue with GET request to /video-accessibility/api/v1/admin/production/queue-stats endpoint.
|
||||
|
|
|
|||
|
|
@ -23,8 +23,8 @@ This 3-hop pattern works for hundreds of articles without vector search.
|
|||
| [[wiki/tech-patterns/_index\|tech-patterns/]] | Recurring tech stacks: FastAPI, React/Vite, Next.js, Azure AD, AI, Box, One2Edit, Redis/Celery, cost-tracker | 17 |
|
||||
| [[wiki/architecture/_index\|architecture/]] | Cross-cutting architectural patterns: Docker Compose, multi-agent AI, GCP timeout, RAG, hotfolder, optical-dev deploy, cost-tracker, new-project checklist, troubleshooting playbooks, ADR log, Cloud Run Jobs | 11 |
|
||||
| [[wiki/client-knowledge/_index\|client-knowledge/]] | Per-client notes for Ford, H&M, L'Oréal, Barclays, Ferrero, 3M | 6 |
|
||||
| [[wiki/concepts/_index\|concepts/]] | Atomic knowledge extracted from Claude Code sessions | 86 |
|
||||
| [[wiki/connections/_index\|connections/]] | Cross-cutting insights linking 2+ concepts: FastAPI+Azure AD+Docker trinity, AI→cost-tracker, Apache+Vite basePath, GCP→REST polling, Box+hotfolder, Docker DNS+AdGuard | 9 |
|
||||
| [[wiki/concepts/_index\|concepts/]] | Atomic knowledge extracted from Claude Code sessions | 89 |
|
||||
| [[wiki/connections/_index\|connections/]] | Cross-cutting insights linking 2+ concepts: FastAPI+Azure AD+Docker trinity, AI→cost-tracker, Apache+Vite basePath, GCP→REST polling, Box+hotfolder, Docker DNS+AdGuard, Celery prefork×faster_whisper memory stacking | 10 |
|
||||
| [[wiki/qa/_index\|qa/]] | Filed answers to queries (saved with `--file-back`) | 0 |
|
||||
| [[wiki/homelab/_index\|homelab/]] | Self-hosted infra: Proxmox install, IOMMU/PCI passthrough, hypervisor setup, budget builds, HP Elitedesk G3, Homarr API + Apps + Boards + Certificates + Integrations + Settings + Tasks + AdGuard + Clock + Docker Stats + Docker Integration + Download Client + Firewall + Proxmox Integration + Radarr + Readarr + Sonarr + Bookmarks + Calendar + Icons + App Widget + Weather + GitHub + Nextcloud + qBittorrent + RSS Feed + Speedtest Tracker + System Health Monitoring + System Resources + Services Map + Media Stack | 42 |
|
||||
| [[wiki/web-agency/_index\|web-agency/]] | AI-assisted website building & selling: Claude Code, Nanobanana 2, Kling, LaunchPath MCP | 9 |
|
||||
|
|
|
|||
|
|
@ -100,5 +100,9 @@
|
|||
| [[wiki/concepts/celery-queue-worker-specialization]] | Named Celery queues: only the container consuming that queue processes tasks — fix bugs in specialised workers by rebuilding THAT container | daily/2026-04-30.md | 2026-04-30 |
|
||||
| [[wiki/concepts/gcs-resumable-upload-pattern]] | Browser → backend creates GCS Resumable Session URI → browser uploads chunks directly to GCS, bypassing LB/Apache; 8 MB chunks, 308=continue, resume via Range header | daily/2026-04-30.md | 2026-04-30 |
|
||||
|
||||
| [[wiki/concepts/celery-prefork-pool-startup-memory]] | Celery prefork forks ALL CONCURRENCY workers at startup — CONCURRENCY=20 × 120 MB = 2.4 GB before first task; OOM before any work | daily/2026-04-30.md | 2026-04-30 |
|
||||
| [[wiki/concepts/sudo-git-clone-root-ownership]] | `sudo git clone` makes all files root-owned — subsequent user `git pull` fails with Permission denied on .git/FETCH_HEAD; fix: chown -R | daily/2026-04-30.md | 2026-04-30 |
|
||||
| [[wiki/concepts/python-fastapi-module-level-singletons]] | `settings = Settings()` at module import level crashes pytest when env vars aren't set — guard with `@lru_cache` function or lazy `@property` | daily/2026-04-30.md | 2026-04-30 |
|
||||
|
||||
<!-- Articles added automatically by compile.py -->
|
||||
<!-- Format: | [[concepts/slug]] | One-line summary | daily/YYYY-MM-DD.md | date | -->
|
||||
|
|
|
|||
82
wiki/concepts/celery-prefork-pool-startup-memory.md
Normal file
82
wiki/concepts/celery-prefork-pool-startup-memory.md
Normal file
|
|
@ -0,0 +1,82 @@
|
|||
---
|
||||
title: "Celery Prefork Pool — All Workers Fork at Startup"
|
||||
aliases:
|
||||
- celery-prefork-startup-memory
|
||||
- celery-concurrency-oom
|
||||
tags:
|
||||
- celery
|
||||
- python
|
||||
- docker
|
||||
- memory
|
||||
- worker
|
||||
sources:
|
||||
- "daily/2026-04-30.md"
|
||||
created: 2026-04-30
|
||||
updated: 2026-04-30
|
||||
---
|
||||
|
||||
# Celery Prefork Pool — All Workers Fork at Startup
|
||||
|
||||
Celery's default `prefork` pool forks **all** `CONCURRENCY` worker processes immediately at startup, not lazily on first task. Each forked process loads the full Python interpreter plus all imports. With `CONCURRENCY=20` and 120 MB per process, that is 2.4 GB of RAM consumed before a single task is processed — enough to OOM-kill a container and stall a pipeline for 15+ minutes while the cause is invisible in application logs.
|
||||
|
||||
## Key Points
|
||||
|
||||
- `prefork` (the default pool type) forks N processes at `celery worker` start time
|
||||
- Each process is a full Python interpreter with all imports loaded
|
||||
- Memory = `CONCURRENCY × per_worker_MB` consumed before any task runs
|
||||
- OOM manifests as the container being killed, not a Python exception
|
||||
- `CONCURRENCY=1` is safe but eliminates parallelism — tune with the formula below
|
||||
|
||||
## Details
|
||||
|
||||
### Safe Concurrency Formula
|
||||
|
||||
```
|
||||
CONCURRENCY = floor(container_memory_MB / per_worker_MB)
|
||||
```
|
||||
|
||||
Measure `per_worker_MB` with:
|
||||
|
||||
```bash
|
||||
# Start one worker, check RSS
|
||||
celery -A app worker --concurrency=1 &
|
||||
ps aux | grep celery
|
||||
```
|
||||
|
||||
Common baselines (no heavy ML models):
|
||||
- Pure Python FastAPI worker: ~60–80 MB
|
||||
- Worker that imports `faster_whisper`: ~400–800 MB per worker (model loaded per process)
|
||||
- Worker that imports `torch`: 300–500 MB baseline
|
||||
|
||||
### Alternative Pool Types
|
||||
|
||||
| Pool | Startup behaviour | Use case |
|
||||
|------|------------------|----------|
|
||||
| `prefork` (default) | All N processes fork immediately | CPU-bound tasks |
|
||||
| `solo` | Single-process, no fork | Dev / low-memory containers |
|
||||
| `gevent` / `eventlet` | Green threads, shared process | I/O-bound tasks |
|
||||
| `threads` | OS threads, shared process | I/O-bound, simpler than gevent |
|
||||
|
||||
Switch via `CELERY_POOL=solo` or `--pool=gevent`.
|
||||
|
||||
### Stacking with ML Libraries
|
||||
|
||||
If a worker imports a model library at module level (e.g. `faster_whisper`, `torch`), that model is loaded into **every** forked process. With `CONCURRENCY=4` and a 400 MB model, startup RAM = 1.6 GB before any inference runs. See the connection article.
|
||||
|
||||
### Symptoms
|
||||
|
||||
- Container killed within 10–30 seconds of `docker compose up`
|
||||
- No Python traceback — OOM killer logs in `dmesg` / `docker events`
|
||||
- `docker stats` shows memory spike to container limit then drop (restart)
|
||||
- Tasks never start processing; queue builds up
|
||||
|
||||
## Related Concepts
|
||||
|
||||
- [[wiki/concepts/faster-whisper-startup-memory]] — model loads at startup in each worker process
|
||||
- [[wiki/connections/celery-prefork-faster-whisper-memory-stacking]] — the combined effect when both apply
|
||||
- [[wiki/concepts/docker-compose-cpu-limits-env]] — memory limits in Compose override files
|
||||
- [[wiki/concepts/celery-queue-worker-specialization]] — specialised workers, smaller CONCURRENCY per service
|
||||
|
||||
## Sources
|
||||
|
||||
- [[daily/2026-04-30.md]] — Session 21:37, ffmpeg-worker OOM diagnosis; CONCURRENCY=20, 2.4 GB pre-task RAM, 15-minute pipeline stall
|
||||
97
wiki/concepts/python-fastapi-module-level-singletons.md
Normal file
97
wiki/concepts/python-fastapi-module-level-singletons.md
Normal file
|
|
@ -0,0 +1,97 @@
|
|||
---
|
||||
title: "Module-Level Singletons Break pytest — Use Lazy Initialisation"
|
||||
aliases:
|
||||
- module-level-settings-pytest
|
||||
- lazy-singleton-fastapi
|
||||
- settings-import-time-instantiation
|
||||
tags:
|
||||
- python
|
||||
- fastapi
|
||||
- pytest
|
||||
- testing
|
||||
- pydantic
|
||||
sources:
|
||||
- "daily/2026-04-30.md"
|
||||
created: 2026-04-30
|
||||
updated: 2026-04-30
|
||||
---
|
||||
|
||||
# Module-Level Singletons Break pytest — Use Lazy Initialisation
|
||||
|
||||
Instantiating `Settings()`, `SomeService()`, or any object that reads environment variables at **module import time** causes pytest to fail when those env vars are not set in the test environment — even for tests that never call that module's functions. Python imports all referenced modules on `import`, so `settings = Settings()` at the top of `config.py` runs as soon as any test file imports anything from that package.
|
||||
|
||||
## Key Points
|
||||
|
||||
- `Settings()` at module level runs at `import` time, not at call time
|
||||
- pytest imports modules eagerly — a test for `routes/health.py` may trigger `config.py` → `Settings()` → `ValidationError`
|
||||
- The failure looks like a config error, not a test design problem
|
||||
- Fix: wrap in `@lru_cache` function or `@property` so instantiation is deferred to first use
|
||||
- Pydantic `BaseSettings` validation runs in `__init__` — there is no "lazy" mode
|
||||
|
||||
## Details
|
||||
|
||||
### Anti-Pattern
|
||||
|
||||
```python
|
||||
# config.py ← runs at import time
|
||||
settings = Settings() # crashes if MONGO_URL not set in test env
|
||||
|
||||
# service.py
|
||||
db_service = DatabaseService() # same problem
|
||||
```
|
||||
|
||||
### Fix 1 — `@lru_cache` function (recommended for FastAPI)
|
||||
|
||||
```python
|
||||
from functools import lru_cache
|
||||
|
||||
@lru_cache
|
||||
def get_settings() -> Settings:
|
||||
return Settings()
|
||||
|
||||
# Use as FastAPI dependency
|
||||
@router.get("/")
|
||||
async def handler(settings: Settings = Depends(get_settings)):
|
||||
...
|
||||
```
|
||||
|
||||
Tests can override with `app.dependency_overrides[get_settings] = lambda: FakeSettings()`.
|
||||
|
||||
### Fix 2 — `@property` on a config holder
|
||||
|
||||
```python
|
||||
class _Config:
|
||||
_settings: Settings | None = None
|
||||
|
||||
@property
|
||||
def settings(self) -> Settings:
|
||||
if self._settings is None:
|
||||
self._settings = Settings()
|
||||
return self._settings
|
||||
|
||||
config = _Config() # safe — no Settings() call yet
|
||||
```
|
||||
|
||||
### Fix 3 — pytest `monkeypatch` / `.env` file
|
||||
|
||||
For tests that genuinely need the real Settings, provide env vars via a `conftest.py`:
|
||||
|
||||
```python
|
||||
@pytest.fixture(autouse=True)
|
||||
def env_vars(monkeypatch):
|
||||
monkeypatch.setenv("MONGO_URL", "mongodb://localhost:27017/test")
|
||||
monkeypatch.setenv("SECRET_KEY", "test-secret")
|
||||
```
|
||||
|
||||
### Why Python 3.14 Makes This Worse
|
||||
|
||||
Python 3.14 has no pre-built wheels for Rust-extension packages (`pydantic-core`, `cryptography`). Poetry silently installs a pure-Python fallback that may behave differently or be missing functionality. Always pin `python = "^3.11"` in `pyproject.toml` and run tests in Docker matching the production Python version.
|
||||
|
||||
## Related Concepts
|
||||
|
||||
- [[wiki/concepts/poetry-docker-version-mismatch]] — Poetry / Python version mismatch causing silent failures
|
||||
- [[wiki/concepts/time-sleep-blocks-asyncio]] — another class of import-time footgun in async FastAPI
|
||||
|
||||
## Sources
|
||||
|
||||
- [[daily/2026-04-30.md]] — Session 13:36, test suite fixes; module-level Settings() crashes, aiohttp mock pattern
|
||||
74
wiki/concepts/sudo-git-clone-root-ownership.md
Normal file
74
wiki/concepts/sudo-git-clone-root-ownership.md
Normal file
|
|
@ -0,0 +1,74 @@
|
|||
---
|
||||
title: "sudo git clone Makes Files Root-Owned — User git Pull Fails"
|
||||
aliases:
|
||||
- sudo-git-clone-root-files
|
||||
- git-permission-denied-fetch-head
|
||||
tags:
|
||||
- git
|
||||
- linux
|
||||
- server
|
||||
- permissions
|
||||
sources:
|
||||
- "daily/2026-04-30.md"
|
||||
created: 2026-04-30
|
||||
updated: 2026-04-30
|
||||
---
|
||||
|
||||
# sudo git clone Makes Files Root-Owned — User git Pull Fails
|
||||
|
||||
Running `sudo git clone` on a server creates every file and directory — including the entire `.git/` folder — owned by `root`. Any subsequent `git pull` or `git fetch` run as a regular user fails with `Permission denied` on `.git/FETCH_HEAD` (or similar index files), even though the user can read the working tree.
|
||||
|
||||
## Key Points
|
||||
|
||||
- `sudo git clone` → all files owned by `root:root`
|
||||
- `git pull` as a non-root user hits a write permission error on `.git/FETCH_HEAD`
|
||||
- The error message looks like a network or credential issue but is purely a filesystem ownership problem
|
||||
- Fix: `sudo chown -R $USER:$USER /opt/project`
|
||||
- Prevention: never use `sudo` for `git clone` unless the repo must be root-owned
|
||||
|
||||
## Details
|
||||
|
||||
### The Error
|
||||
|
||||
```
|
||||
error: cannot open .git/FETCH_HEAD: Permission denied
|
||||
```
|
||||
|
||||
or
|
||||
|
||||
```
|
||||
fatal: Unable to create '/opt/project/.git/index.lock': Permission denied
|
||||
```
|
||||
|
||||
### Fix
|
||||
|
||||
```bash
|
||||
sudo chown -R $USER:$USER /opt/project
|
||||
# Verify
|
||||
ls -la /opt/project/.git/
|
||||
```
|
||||
|
||||
### Prevention
|
||||
|
||||
If deploying to `/opt/` or `/srv/` (root-owned dirs), create the directory first, then clone as the service user:
|
||||
|
||||
```bash
|
||||
sudo mkdir -p /opt/project
|
||||
sudo chown deploy:deploy /opt/project
|
||||
git clone git@github.com:org/project.git /opt/project
|
||||
```
|
||||
|
||||
Or use `sudo -u deploy git clone ...` to clone as the deploy user directly.
|
||||
|
||||
### Why This Happens
|
||||
|
||||
`sudo` switches the effective UID to root. `git clone` creates all files with the current effective UID as owner. There is no `--chown` flag on `git clone`, unlike `docker cp`.
|
||||
|
||||
## Related Concepts
|
||||
|
||||
- [[wiki/concepts/monorepo-deploy-script-pitfall]] — another class of silent git failure during deploys
|
||||
- [[wiki/concepts/python-service-deployment-dotenv]] — deploy checklist for Python services
|
||||
|
||||
## Sources
|
||||
|
||||
- [[daily/2026-04-30.md]] — Session 12:11, re-deploy after project folder deletion; sudo git clone footgun discovered
|
||||
|
|
@ -15,5 +15,7 @@
|
|||
| [[wiki/connections/box-api-hotfolder-pattern]] | Box API ↔ hotfolder daemon — always paired; archive pattern prevents double-processing | 2026-04-27 | 2026-04-27 |
|
||||
| [[wiki/connections/docker-dns-adguard-split-horizon]] | Docker DNS ↔ AdGuard split-horizon — Docker containers inherit router DNS, not AdGuard; explicit dns: config required | daily/2026-04-28.md | 2026-04-28 |
|
||||
|
||||
| [[wiki/connections/celery-prefork-faster-whisper-memory-stacking]] | Celery prefork fork-all ↔ faster_whisper model-at-startup — CONCURRENCY × model_size GB consumed before first task | daily/2026-04-30.md | 2026-04-30 |
|
||||
|
||||
<!-- Articles added automatically by compile.py -->
|
||||
<!-- Format: | [[connections/slug]] | ConceptA ↔ ConceptB | daily/YYYY-MM-DD.md | date | -->
|
||||
|
|
|
|||
|
|
@ -0,0 +1,86 @@
|
|||
---
|
||||
title: "Connection: Celery Prefork × faster_whisper — Memory Stacking"
|
||||
connects:
|
||||
- "concepts/celery-prefork-pool-startup-memory"
|
||||
- "concepts/faster-whisper-startup-memory"
|
||||
sources:
|
||||
- "daily/2026-04-30.md"
|
||||
created: 2026-04-30
|
||||
updated: 2026-04-30
|
||||
---
|
||||
|
||||
# Connection: Celery Prefork × faster_whisper — Memory Stacking
|
||||
|
||||
## The Connection
|
||||
|
||||
Two independent startup-memory behaviours combine multiplicatively when `faster_whisper` is imported inside a Celery worker module:
|
||||
|
||||
1. **Celery prefork** forks ALL `CONCURRENCY` worker processes at `celery worker` start — each is a full Python interpreter with all imports loaded.
|
||||
2. **faster_whisper** loads the full transcription model into RAM at import time (when `WhisperModel(...)` is called at module level or in a module-level `@worker_init` signal handler).
|
||||
|
||||
Result: `CONCURRENCY=4` with a 400 MB Whisper model = **1.6 GB** consumed before the first transcription task is dequeued.
|
||||
|
||||
## Key Insight
|
||||
|
||||
> Neither behaviour is a bug in isolation — the danger is invisible until they are combined in the same container.
|
||||
|
||||
The `faster-whisper-startup-memory` article documents the per-container model loading cost. The `celery-prefork-pool-startup-memory` article documents the per-worker process forking cost. When they stack, the formula becomes:
|
||||
|
||||
```
|
||||
total_startup_RAM = CONCURRENCY × (base_worker_MB + model_size_MB)
|
||||
```
|
||||
|
||||
Example with `large-v3` model (~1.5 GB) and `CONCURRENCY=4`:
|
||||
|
||||
```
|
||||
4 × (80 MB interpreter + 1500 MB model) = 6.3 GB before first task
|
||||
```
|
||||
|
||||
A container with a 4 GB memory limit is OOM-killed before it processes anything.
|
||||
|
||||
## Evidence
|
||||
|
||||
- Session 21:37 (2026-04-30): ffmpeg-worker with `CONCURRENCY=20`, ~120 MB/process → 2.4 GB, container OOM-killed, 15-minute pipeline stall
|
||||
- The stall was compounded because Celery silently retries tasks that were in-flight when the worker died, creating a second wave of OOM on restart
|
||||
|
||||
## Solutions
|
||||
|
||||
### Option A — Reduce concurrency to match model size
|
||||
|
||||
```
|
||||
CONCURRENCY = floor(container_memory_MB / (base_MB + model_MB))
|
||||
```
|
||||
|
||||
### Option B — Separate transcription into its own single-worker container
|
||||
|
||||
Keep `CONCURRENCY=1` for the whisper worker, scale by adding containers, not by increasing CONCURRENCY. Each container has exactly one model copy.
|
||||
|
||||
### Option C — Load model lazily (inside the task, not at import)
|
||||
|
||||
```python
|
||||
_model = None
|
||||
|
||||
@app.task
|
||||
def transcribe(audio_path: str):
|
||||
global _model
|
||||
if _model is None:
|
||||
_model = WhisperModel("large-v3")
|
||||
return _model.transcribe(audio_path)
|
||||
```
|
||||
|
||||
Downside: first task in each process pays the load latency (~5–15 s). Subsequent tasks in the same process reuse the loaded model.
|
||||
|
||||
### Option D — Use `solo` or `threads` pool
|
||||
|
||||
`CELERY_POOL=solo` runs tasks in the main process with no forking — only one model copy regardless of logical concurrency. Appropriate for GPU workers where parallelism is handled at the GPU level.
|
||||
|
||||
## Related Concepts
|
||||
|
||||
- [[wiki/concepts/celery-prefork-pool-startup-memory]] — Celery fork-all-at-startup behaviour
|
||||
- [[wiki/concepts/faster-whisper-startup-memory]] — model loaded at container start
|
||||
- [[wiki/concepts/celery-queue-worker-specialization]] — isolating whisper work to dedicated containers
|
||||
- [[wiki/concepts/docker-compose-cpu-limits-env]] — setting memory limits in Compose
|
||||
|
||||
## Sources
|
||||
|
||||
- [[daily/2026-04-30.md]] — Session 21:37, Celery ffmpeg-worker OOM; identified as combined prefork + model-loading issue
|
||||
|
|
@ -1,6 +1,12 @@
|
|||
|
||||
# Build Log
|
||||
|
||||
## [2026-04-30T23:30:00+01:00] compile | 2026-04-30.md (pass 2)
|
||||
- Source: daily/2026-04-30.md
|
||||
- Articles created: [[wiki/concepts/celery-prefork-pool-startup-memory]], [[wiki/concepts/sudo-git-clone-root-ownership]], [[wiki/concepts/python-fastapi-module-level-singletons]], [[wiki/connections/celery-prefork-faster-whisper-memory-stacking]]
|
||||
- Articles updated: (none)
|
||||
- Index updates: [[wiki/concepts/_index]] (86→89); [[wiki/connections/_index]] (9→10); [[wiki/_master-index]] (concepts 86→89, connections 9→10)
|
||||
|
||||
## [2026-04-30T21:00:00+01:00] compile | 2026-04-30.md
|
||||
- Source: daily/2026-04-30.md
|
||||
- Articles created: [[wiki/concepts/pydub-ffmpeg-silent-dependency]], [[wiki/concepts/lameenc-bytearray-gcs-upload]], [[wiki/concepts/apache-mod-alias-proxy-priority]], [[wiki/concepts/faster-whisper-startup-memory]], [[wiki/concepts/celery-redis-queue-flush-on-deterministic-error]], [[wiki/concepts/cline-lm-studio-openai-compatible]], [[wiki/concepts/celery-queue-worker-specialization]], [[wiki/concepts/gcs-resumable-upload-pattern]]
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue