amazon-transcreation

History

DJP 70cade819c Source-line batching with prompt caching for arbitrarily large briefs Previously briefs above ~150 source lines hit the Sonnet 4.6 64k output cap and were silently truncated. Now we batch: - ≤70 lines: one LLM call (no change) - 71-150: batches of 50 (2-3 calls) - 151+: batches of 40 (unbounded) Each batch uses Anthropic prompt caching: the V25 system prompt + job parameters + TM entries + reference data + supplementary files form a cached prefix; only the per-batch source lines vary. After the first batch, subsequent batches read the prefix from cache at ~10% input cost, so an N-batch job costs roughly (1 + 0.1*(N-1)) full prompts instead of N. Implementation: - New LLMClient.create_message_cached / acreate_message_cached methods that mark system_prompt and cached_user_content with cache_control: ephemeral. Tracks cache_creation_input_tokens and cache_read_input_tokens in usage and applies the right cost rates (1.25x for writes, 0.1x for reads). - AgentSingle.run() refactored to build the cached static prefix once, then loop over batches sending only the per-batch source lines as the dynamic content. Each batch's parsed rows are appended to context.draft_outputs / ranking_declarations. - Per-batch instructions added to the prompt for batched runs ("This is batch N of M ... output a table for these lines only ... do not repeat prior batches"). Single-call runs (≤70 lines) skip this note. - Linguistic summary: kept from the last batch (batched mode) or the single batch (single mode). - Per-batch logging of input_tokens / cache_read / cache_creation / output_tokens / stop_reason for visibility. Verified end-to-end: N=10/70/100/150/250 produce 1/1/2/3/7 LLM calls with correct draft counts, and live caching reads the cached prefix on the second call within the 5-minute TTL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-05 15:02:48 -04:00
..
api	Round 2 feedback: parser fix, dynamic max_tokens, polling, TM auto-discovery, reviewer comments in export	2026-05-04 16:12:47 -04:00
auth	Implement user management: viewer role, real API wiring, admin sidebar	2026-04-15 18:37:16 +01:00
llm	Source-line batching with prompt caching for arbitrarily large briefs	2026-05-05 15:02:48 -04:00
models	Implement user management: viewer role, real API wiring, admin sidebar	2026-04-15 18:37:16 +01:00
pipeline	Source-line batching with prompt caching for arbitrarily large briefs	2026-05-05 15:02:48 -04:00
schemas	Implement user management: viewer role, real API wiring, admin sidebar	2026-04-15 18:37:16 +01:00
services	Round 2.5 feedback: TM replacements take effect, supplementary files reach LLM, larger briefs fit, free-text channel uploads	2026-05-05 14:28:20 -04:00
tasks	Round 2.5 feedback: TM replacements take effect, supplementary files reach LLM, larger briefs fit, free-text channel uploads	2026-05-05 14:28:20 -04:00
ws	feat: complete Phase 1-2 scaffold — backend, frontend, pipeline skeleton	2026-04-10 12:31:43 -04:00
__init__.py	feat: complete Phase 1-2 scaffold — backend, frontend, pipeline skeleton	2026-04-10 12:31:43 -04:00
config.py	Add Azure AD MSAL SSO (SPA token exchange)	2026-04-15 18:08:46 +01:00
dependencies.py	feat: complete Phase 1-2 scaffold — backend, frontend, pipeline skeleton	2026-04-10 12:31:43 -04:00
main.py	feat: complete Phase 1-2 scaffold — backend, frontend, pipeline skeleton	2026-04-10 12:31:43 -04:00