vault backup: 2026-04-28 22:21:29

This commit is contained in:
Vadym Samoilenko 2026-04-28 22:21:29 +01:00
parent d93c9f4516
commit ef2302eb52
7 changed files with 424 additions and 2 deletions

View file

@ -23,7 +23,7 @@ This 3-hop pattern works for hundreds of articles without vector search.
| [[wiki/tech-patterns/_index\|tech-patterns/]] | Recurring tech stacks: FastAPI, React/Vite, Next.js, Azure AD, AI, Box, One2Edit, Redis/Celery, cost-tracker | 13 |
| [[wiki/architecture/_index\|architecture/]] | Cross-cutting architectural patterns: Docker Compose, multi-agent AI, GCP timeout, RAG, hotfolder, optical-dev deploy, cost-tracker, new-project checklist, troubleshooting playbooks, ADR log | 10 |
| [[wiki/client-knowledge/_index\|client-knowledge/]] | Per-client notes for Ford, H&M, L'Oréal, Barclays, Ferrero, 3M | 6 |
| [[wiki/concepts/_index\|concepts/]] | Atomic knowledge extracted from Claude Code sessions | 54 |
| [[wiki/concepts/_index\|concepts/]] | Atomic knowledge extracted from Claude Code sessions | 57 |
| [[wiki/connections/_index\|connections/]] | Cross-cutting insights linking 2+ concepts: FastAPI+Azure AD+Docker trinity, AI→cost-tracker, Apache+Vite basePath, GCP→REST polling, Box+hotfolder, Docker DNS+AdGuard | 9 |
| [[wiki/qa/_index\|qa/]] | Filed answers to queries (saved with `--file-back`) | 0 |
| [[wiki/homelab/_index\|homelab/]] | Self-hosted infra: Proxmox install, IOMMU/PCI passthrough, hypervisor setup, budget builds, HP Elitedesk G3, Homarr API + Apps + Boards + Certificates + Integrations + Settings + Tasks + AdGuard + Clock + Docker Stats + Docker Integration + Download Client + Firewall + Proxmox Integration + Radarr + Readarr + Sonarr + Bookmarks + Calendar + Icons + App Widget + Weather + GitHub + Nextcloud + qBittorrent + RSS Feed + Speedtest Tracker + System Health Monitoring + System Resources + Services Map + Media Stack | 38 |

View file

@ -60,5 +60,9 @@
| [[wiki/concepts/python-iso-z-suffix]] | Python < 3.11 `fromisoformat()` rejects `Z` suffix from JS `toISOString()` replace `Z` with `+00:00` before parsing | daily/2026-04-24.md | 2026-04-24 |
| [[wiki/concepts/gemini-conversation-cost-scaling]] | Gemini bills full accumulated conversation history per turn — cost grows quadratically; backfill scripts must account for this | daily/2026-04-24.md | 2026-04-24 |
| [[wiki/concepts/asyncio-contextvar-task-boundary]] | Python ContextVar is NOT propagated through asyncio.wait_for / create_task — pass user_id as explicit parameter | daily/2026-04-27.md | 2026-04-27 |
| [[wiki/concepts/pydantic-v2-alias-id-gotcha]] | Pydantic v2 Field(alias="_id") serializes JSON key as "_id" not "id" — frontend .id is undefined; fix with _from_doc() helper | daily/2026-04-27.md | 2026-04-27 |
| [[wiki/concepts/php-display-errors-json-leak]] | PHP display_errors=1 prepends HTML warnings to JSON — "Unexpected token '<'" is the diagnostic signal; ini_set order matters | daily/2026-04-27.md | 2026-04-27 |
<!-- Articles added automatically by compile.py -->
<!-- Format: | [[concepts/slug]] | One-line summary | daily/YYYY-MM-DD.md | date | -->

View file

@ -0,0 +1,113 @@
---
title: "Python asyncio — ContextVar Does Not Propagate Across Task Boundaries"
aliases: [asyncio-contextvar, python-contextvar-wait-for, contextvar-task-boundary]
tags: [python, asyncio, concurrency, debugging, fastapi, gotcha]
sources:
- "daily/2026-04-27.md"
created: 2026-04-27
updated: 2026-04-27
---
# Python asyncio — ContextVar Does Not Propagate Across Task Boundaries
Python's `contextvars.ContextVar` is copied at task creation time — the copy is a snapshot. Changes made to a `ContextVar` in a parent coroutine after a child task is spawned are NOT visible to the child. More critically, `asyncio.wait_for()`, `asyncio.create_task()`, and `asyncio.ensure_future()` all create a new execution context — any `ContextVar` set in the calling coroutine before spawning is visible (it was in the snapshot), but the pattern of setting the var in a middleware and reading it deep inside a task is fragile because the snapshot timing matters.
In practice, the failure mode is: a FastAPI middleware sets `current_user_ctx.set(user)`, but a background task invoked via `asyncio.wait_for()` reads `current_user_ctx.get()` and gets the default value (`""`) instead. The bug is silent — no exception, just empty string in cost tracker events or missing audit logs.
## Key Points
- **`ContextVar` is snapshot-based**: when a task is created, it inherits a copy of the current context — changes to the original context after that point are invisible
- **`asyncio.wait_for()` creates a new task**, so the same propagation rules apply — do NOT rely on ContextVar being readable inside `wait_for`-wrapped coroutines
- **The failure is silent**: `ContextVar.get()` returns the default value (`""`, `None`) without raising — the app continues working with empty/wrong user identity
- **The fix is always the same**: pass the value as an explicit function parameter — `user_id: str` all the way down the call chain
- Applies to any ContextVar use: current user, request ID, tenant ID, tracing spans
## Details
### How the Bug Manifests
```python
# ❌ BROKEN — ContextVar + asyncio.wait_for
_user_ctx: ContextVar[str] = ContextVar("user", default="")
async def process_request(user_id: str, prompt: str):
_user_ctx.set(user_id) # set in this coroutine's context
result = await asyncio.wait_for( # creates a new task
call_ai_with_tracking(prompt), # ← _user_ctx.get() == "" here
timeout=300,
)
return result
async def call_ai_with_tracking(prompt: str):
user = _user_ctx.get() # returns "" — the snapshot didn't include .set() above
await record(user_external_id=user, ...)
```
### The Fix: Explicit Parameters
```python
# ✅ CORRECT — explicit parameter
async def process_request(user_id: str, prompt: str):
result = await asyncio.wait_for(
call_ai_with_tracking(prompt, user_id=user_id), # pass explicitly
timeout=300,
)
return result
async def call_ai_with_tracking(prompt: str, user_id: str):
await record(user_external_id=user_id, ...) # always available
```
This pattern must be propagated down the entire call chain — every function between `process_request` and `record()` needs to accept and forward `user_id`.
### When ContextVar DOES Work
ContextVar is reliable in the main request coroutine and synchronous call stack:
```python
# ✅ WORKS — same coroutine, no task boundary
async def process_request(user_id: str):
_user_ctx.set(user_id)
result = await call_directly(prompt) # NOT wait_for, NOT create_task
return result
async def call_directly(prompt: str):
user = _user_ctx.get() # works — same context chain, no task creation
```
ContextVar also works for FastAPI request-scope middleware that sets a var, then reads it in the same request handler — as long as no background tasks are spawned within that request.
### Task Boundaries That Break ContextVar
| API | Creates new context | ContextVar broken |
|-----|--------------------|--------------------|
| `await coroutine()` | No | No |
| `asyncio.wait_for(coro, timeout)` | Yes | **Yes** |
| `asyncio.create_task(coro)` | Yes | **Yes** |
| `asyncio.ensure_future(coro)` | Yes | **Yes** |
| `asyncio.gather(*coros)` | Yes (each) | **Yes** |
| `loop.run_in_executor(fn)` | Yes | **Yes** |
### Real Incident (2026-04-27)
NotebookLM (FastAPI backend): a `ContextVar` set in the route handler with `set_user_ctx(user.email)` was read inside `asyncio.wait_for(generate_notebook(...), timeout=300)`. The `_user` ContextVar returned `""` inside the task — cost tracker events were recorded with `user_external_id=""`.
Fix: removed `ContextVar` entirely. Changed `generate_notebook(prompt)` to `generate_notebook(prompt, user_external_id=user.email)`. All callers updated to pass the value explicitly.
### LlamaIndex Note
LlamaIndex callbacks and event system also use `ContextVar` internally. If wrapping LlamaIndex calls with `asyncio.wait_for`, trace context may be lost. Additionally, `extract_llama_tokens()` helpers can return `(0, 0)` — always add a fallback:
```python
input_tok, output_tok = extract_llama_tokens(response) or (len(prompt) // 4, 200)
```
## Related Concepts
- [[wiki/tech-patterns/cost-tracker-integration]] — cost tracker integration where this bug was discovered; Step 4 documents the explicit-parameter pattern
- [[wiki/concepts/preflight-record-pattern]] — the preflight/record calls that received empty `user_external_id` due to this bug
- [[wiki/tech-patterns/python-ai-agents]] — Python AI agent patterns; explicit parameter passing applies throughout
## Sources
- [[daily/2026-04-27.md]] — NotebookLM cost tracker integration: `_user` ContextVar returned `""` inside `asyncio.wait_for`-wrapped notebook generation; confirmed same issue in video-accessibility; fix was explicit `user_external_id` parameter throughout call chain

View file

@ -2,7 +2,7 @@
title: LiteLLM as Pricing Source
tags: [concept, ai, cost-tracking, pricing, llm]
created: 2026-04-27
updated: 2026-04-27
updated: 2026-04-28
---
# LiteLLM as Pricing Source
@ -54,6 +54,24 @@ These are defined in `pricing/models.yaml` in the cost-tracker repo. See [[wiki/
3. Old price records kept forever for historical reporting
4. To freeze at a known-good version: set `LITELLM_COMMIT_HASH` env var
## Gemini Provider Key Gotcha
LiteLLM stores Gemini model prices under the provider key `vertex_ai-language-models`, **not** `google`. If your integration sends `provider: "google"` in API calls, the cost-tracker's pricing engine won't find a match and returns `cost_usd = null`.
**Fix:** add a provider alias in `pricing_engine.py` that maps `"google"``"vertex_ai-language-models"`. This keeps client code readable (`"google"`) while matching LiteLLM's internal naming:
```python
PROVIDER_ALIASES = {
"google": "vertex_ai-language-models",
}
def lookup_price(provider: str, model: str):
canonical = PROVIDER_ALIASES.get(provider, provider)
return model_prices.get(f"{canonical}/{model}")
```
Always verify the exact provider key by searching the LiteLLM JSON for your model name before assuming the provider string.
## The alternative considered
Direct website scraping was evaluated and rejected due to the problems listed above. LiteLLM is the standard community solution for this exact use case.

View file

@ -0,0 +1,151 @@
---
title: "PHP — display_errors Leaking Warnings into JSON API Responses"
aliases: [php-display-errors, php-json-api-errors, php-unexpected-token, php-ini-set-override]
tags: [php, debugging, api, json, backend, production]
sources:
- "daily/2026-04-27.md"
created: 2026-04-27
updated: 2026-04-27
---
# PHP — display_errors Leaking Warnings into JSON API Responses
When PHP's `display_errors` is enabled, any warning, notice, or error emitted before the JSON response body is prepended to the output buffer. The API client receives something like `<br/><b>Warning</b>: ...{"result": "..."}` — valid JSON is now invalid because it's prefixed with HTML. JavaScript's `JSON.parse()` throws "Unexpected token '<'", which is the reliable diagnostic signal.
## Key Points
- **"Unexpected token '<'" in JSON.parse** = PHP warning or error is being emitted before the JSON body — not a data problem, a `display_errors` problem
- **`ini_set('display_errors', 0)` in `api.php` does NOT override `display_errors = 1` in an included config file loaded after it** — PHP processes `ini_set` sequentially; a config file with `display_errors = 1` loaded by `require_once('config.php')` after the ini_set undoes the suppression
- **`config.example.php` committed with `display_errors = 1`** is a common footgun — developers copy it to `config.php` on the server without changing the value
- **The fix**: set `display_errors = 0` directly in `php.ini` or `.htaccess`, or ensure the config file is loaded BEFORE the ini_set call, or use an output buffer (`ob_start`) to catch stray output before sending JSON
- Probabilistic/intermittent bugs (10% of requests) that delete required directories are especially dangerous — they interact with display_errors to produce sporadic failures
## Details
### How the Leak Happens
```php
// api.php — attempts to suppress errors
ini_set('display_errors', 0);
// Later in the request lifecycle:
require_once('config.php'); // sets display_errors = 1 — overrides the ini_set above
// Later still, something triggers a warning:
file_put_contents('/path/that/was/deleted', $data);
// PHP emits: "<br/><b>Warning</b>: file_put_contents(...)..."
// api.php then sends JSON:
header('Content-Type: application/json');
echo json_encode($result);
// Output: "<br/><b>Warning</b>: file_put_contents(...)... {\"key\":\"value\"}"
```
The client receives this mixed output, `JSON.parse()` sees `<` as the first character, and throws:
```
SyntaxError: Unexpected token '<', "<br><b>War"... is not valid JSON
```
### Diagnosing the Root Cause
When a PHP API returns "Unexpected token '<'":
1. **Check `display_errors` in `php.ini`:** `php -r "echo ini_get('display_errors');"` — should be `0` in production
2. **Check included config files:** `grep -r "display_errors" /path/to/project/` — find every location it's set
3. **Check load order:** if `config.php` is loaded after `ini_set('display_errors', 0)`, the config wins
4. **Test with curl to see raw output:** `curl -i https://yourapi.com/endpoint` — the HTML warning appears before the JSON
### The Fix: Three Options
**Option 1: Fix in `php.ini` or `.htaccess` (recommended)**
```ini
; php.ini
display_errors = Off
log_errors = On
error_log = /var/log/php/errors.log
```
```apache
; .htaccess
php_flag display_errors Off
```
This applies regardless of what application code does.
**Option 2: Set in config.php before any other code**
```php
// config.php — must be the FIRST setting, before any require_once
ini_set('display_errors', 0);
error_reporting(E_ALL);
ini_set('log_errors', 1);
ini_set('error_log', '/var/log/php/errors.log');
```
**Option 3: Output buffer (belt-and-suspenders)**
```php
// api.php
ob_start(); // catch any stray output
require_once('config.php');
// ... application code ...
$output = ob_get_clean(); // clear the buffer
if (strpos($output, '<') !== false) {
error_log("Stray HTML in API output: " . substr($output, 0, 200));
}
echo json_encode($result); // only the JSON goes to the client
```
### The Probabilistic Cleanup Bug (2026-04-27 Incident)
The actual bug that triggered the display_errors issue:
```php
// ❌ DANGEROUS — auto-cleanup deletes the images/ directory itself, not just old files
function autoCleanupExpiredImages() {
if (rand(1, 10) === 1) { // 10% chance per request
$files = glob('images/*');
if (count($files) > 50) {
array_map('unlink', $files);
rmdir('images/'); // deletes the directory!
}
}
}
```
The cleanup ran in the same request as `saveImage()`, deleted the `images/` directory, then `saveImage()` tried to write there and emitted a PHP warning. With `display_errors = 1`, that warning prepended the JSON response.
The defensive fix in `saveImage()`:
```php
function saveImage($filename, $data) {
if (!is_dir('images/')) {
mkdir('images/', 0755, true); // recreate if missing
}
file_put_contents('images/' . $filename, $data);
}
```
The deeper fix: cleanup functions should delete file contents, not the parent directory.
### config.example.php as a Footgun
Any `config.example.php` committed to version control with `display_errors = 1` becomes the template for production deployments. Developers copy it to `config.php` and never change the value. This is how `display_errors = 1` ends up in production.
Correct `config.example.php`:
```php
// config.example.php — safe defaults for production
define('DISPLAY_ERRORS', false);
ini_set('display_errors', 0);
ini_set('log_errors', 1);
ini_set('error_log', __DIR__ . '/logs/errors.log');
// API Keys — replace with actual values
define('GEMINI_API_KEY', 'YOUR_KEY_HERE');
```
## Related Concepts
- [[wiki/concepts/shell-static-deploy-patterns]] — deploy script patterns; similar "safe defaults in committed config" principle applies
- [[wiki/concepts/monorepo-deploy-script-pitfall]] — another class of "silent failure from config oversight"
- [[wiki/tech-patterns/nodejs-vanilla-proxy]] — Node.js API alternative where this class of PHP error doesn't apply
## Sources
- [[daily/2026-04-27.md]] — Lux Studio (AI Cinematography): `JSON.parse` "Unexpected token '<'" error; root cause was two bugs interacting: `autoCleanupExpiredImages()` deleting `images/` directory (10% chance) + `config.php` having `display_errors = 1` from `config.example.php` template; fixes: `saveImage()` recreates directory + `display_errors = 0` in config

View file

@ -0,0 +1,131 @@
---
title: "Pydantic v2 — Field(alias='_id') Serializes as _id, Breaking Frontend .id Access"
aliases: [pydantic-v2-alias, pydantic-alias-id, pydantic-mongodb-id, fastapi-mongodb-id]
tags: [pydantic, fastapi, mongodb, python, frontend, debugging, gotcha]
sources:
- "daily/2026-04-27.md"
created: 2026-04-27
updated: 2026-04-27
---
# Pydantic v2 — Field(alias='_id') Serializes as _id, Breaking Frontend .id Access
When a Pydantic v2 response model uses `Field(alias="_id")` to map MongoDB's `_id` field, the JSON response contains the key `_id` — not `id`. Frontend JavaScript code that reads `.id` from the response gets `undefined`. Downstream effects are silent and varied: dropdown selects that use `value={item.id}` silently use the display text as the value, causing API calls with URL paths like `/clients/3M/teams` (using the name) instead of `/clients/64a3b.../teams` (using the ObjectId).
## Key Points
- **`Field(alias="_id")` in Pydantic v2** causes the response JSON key to be `_id`, not `id` — even if the Python attribute is named `id`
- **Frontend receives `{_id: "...", name: "..."}` not `{id: "...", name: "..."}`** — accessing `.id` returns `undefined`
- **Silent failure in dropdowns**: a `<Select value={item.id}>` where `item.id === undefined` causes the browser to use the label text as the option value — the wrong value is sent to the API
- **The fix**: remove the alias from response models; use an explicit `_from_doc()` helper that maps `_id → id` and drops the MongoDB key
- Pydantic v1 handled this differently — `response_model_by_alias=True` was a common workaround; v2 changes the alias serialization behavior
## Details
### The Problem
```python
# ❌ BROKEN Pydantic v2 response model
class ClientResponse(BaseModel):
id: str = Field(alias="_id") # causes JSON to contain "_id" not "id"
name: str
domain: str
```
```json
// JSON response from FastAPI
{"_id": "64a3b...", "name": "3M", "domain": "3m.com"}
```
```typescript
// Frontend
const { data: clients } = await fetch("/clients")
clients.map(c => c.id) // → [undefined, undefined, undefined]
// c._id has the value, but JS reads c.id
```
When a React `<Select>` uses `value={client.id}` and `client.id` is `undefined`, React renders the option but the value attribute is the display text of the selected option. The form submits the client's name ("3M") as the ID. The API endpoint `/clients/3M/teams` returns 404.
### The Fix: Explicit _from_doc() Helper
Remove all aliases from response models. Use a classmethod to construct the model from a MongoDB document:
```python
# ✅ CORRECT — no alias, explicit mapping
class ClientResponse(BaseModel):
id: str # JSON key is "id"
name: str
domain: str
@classmethod
def _from_doc(cls, doc: dict) -> "ClientResponse":
return cls(
id=str(doc["_id"]), # explicit rename from _id to id
name=doc["name"],
domain=doc.get("domain", ""),
)
```
```python
# In the route
@router.get("/clients")
async def list_clients(db=Depends(get_db)):
docs = await db.clients.find().to_list(None)
return [ClientResponse._from_doc(doc) for doc in docs]
```
### Frontend Defensive Guard
While fixing the backend is the correct solution, add a guard in hooks to catch future regressions:
```typescript
// React hook with guard
const { data } = await fetch("/clients")
if (data[0]?.id === undefined) {
console.error("API response missing 'id' field — backend may be returning '_id'")
}
```
Also guard against `undefined` being used as a Select value:
```typescript
// Only fetch teams if we have a valid client ID
const { data: teams } = useQuery(
["teams", clientId],
() => fetchTeams(clientId),
{ enabled: clientId !== undefined && clientId !== "" }
)
```
### Pydantic v2 Behavior vs v1
In Pydantic v1, `Field(alias="_id")` worked differently with `response_model_by_alias=True` in FastAPI — it would serialize using the alias (`_id`) in responses. This was actually the intended behavior for MongoDB models, but it produced the same frontend-breaking JSON key.
In Pydantic v2, the model's serialization always uses the alias if set, regardless of FastAPI's `by_alias` setting. The correct approach in v2 is to avoid aliases on response models entirely and use `_from_doc()` helpers.
### Which Models Are Affected
This issue affects any Pydantic response model that maps MongoDB's `_id`:
```python
# All of these produce "_id" in JSON response (v2 behavior)
id: str = Field(alias="_id")
id: PyObjectId = Field(alias="_id")
id: Annotated[str, Field(alias="_id")]
```
The pattern is ubiquitous in FastAPI+MongoDB tutorials because it matches Pydantic v1 idioms that no longer work correctly in v2.
### Real Incident (2026-04-27)
video-accessibility admin panel: `ClientResponse`, `TeamResponse`, and `ProjectResponse` all used `Field(alias="_id")`. The client/team/project cascading dropdown selects silently used display names as values. Creating a job for "3M" → Team dropdown → API called `/clients/3M/teams` → 404. Fixed in commit during the 2026-04-27 session by removing aliases and adding `_from_doc()` classmethod to all three response models.
## Related Concepts
- [[wiki/concepts/fastapi-mongodb-role-migration]] — FastAPI + MongoDB backend patterns; `_from_doc` helper fits into this same pattern
- [[wiki/tech-patterns/fastapi-python-docker]] — FastAPI tech stack used in Oliver Agency projects
- [[wiki/concepts/export-endpoint-filter-pattern]] — another case of frontend state being silently wrong due to ID mismatch
## Sources
- [[daily/2026-04-27.md]] — video-accessibility admin dropdown 404s; root cause was Pydantic v2 `Field(alias="_id")` serializing as `_id` in JSON; `client.id === undefined` on frontend; fixed by removing aliases and adding `_from_doc()` helpers; same pattern applied to Team and Project models

View file

@ -3,6 +3,11 @@
<!-- Append-only chronological record of compile, query, and lint operations -->
## [2026-04-28T22:30:00+01:00] compile | 2026-04-27.md
- Source: daily/2026-04-27.md
- Articles created: [[wiki/concepts/asyncio-contextvar-task-boundary]], [[wiki/concepts/pydantic-v2-alias-id-gotcha]], [[wiki/concepts/php-display-errors-json-leak]]
- Articles updated: [[wiki/concepts/litellm-pricing-source]] (added Gemini vertex_ai-language-models provider key gotcha)
## [2026-04-14T00:00:00] init | Knowledge Base Initialized
- System set up from coleam00/claude-memory-compiler
- Based on Karpathy's LLM Wiki architecture