amazon-transcreation/backend
DJP 70cade819c Source-line batching with prompt caching for arbitrarily large briefs
Previously briefs above ~150 source lines hit the Sonnet 4.6 64k output
cap and were silently truncated. Now we batch:

- ≤70 lines:  one LLM call (no change)
- 71-150:     batches of 50 (2-3 calls)
- 151+:       batches of 40 (unbounded)

Each batch uses Anthropic prompt caching: the V25 system prompt + job
parameters + TM entries + reference data + supplementary files form a
cached prefix; only the per-batch source lines vary. After the first
batch, subsequent batches read the prefix from cache at ~10% input cost,
so an N-batch job costs roughly (1 + 0.1*(N-1)) full prompts instead
of N.

Implementation:
- New LLMClient.create_message_cached / acreate_message_cached methods
  that mark system_prompt and cached_user_content with cache_control:
  ephemeral. Tracks cache_creation_input_tokens and
  cache_read_input_tokens in usage and applies the right cost rates
  (1.25x for writes, 0.1x for reads).
- AgentSingle.run() refactored to build the cached static prefix once,
  then loop over batches sending only the per-batch source lines as the
  dynamic content. Each batch's parsed rows are appended to
  context.draft_outputs / ranking_declarations.
- Per-batch instructions added to the prompt for batched runs ("This is
  batch N of M ... output a table for these lines only ... do not
  repeat prior batches"). Single-call runs (≤70 lines) skip this note.
- Linguistic summary: kept from the last batch (batched mode) or the
  single batch (single mode).
- Per-batch logging of input_tokens / cache_read / cache_creation /
  output_tokens / stop_reason for visibility.

Verified end-to-end: N=10/70/100/150/250 produce 1/1/2/3/7 LLM calls
with correct draft counts, and live caching reads the cached prefix on
the second call within the 5-minute TTL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 15:02:48 -04:00
..
alembic Implement user management: viewer role, real API wiring, admin sidebar 2026-04-15 18:37:16 +01:00
app Source-line batching with prompt caching for arbitrarily large briefs 2026-05-05 15:02:48 -04:00
tests feat: complete Phase 1-2 scaffold — backend, frontend, pipeline skeleton 2026-04-10 12:31:43 -04:00
alembic.ini feat: complete Phase 1-2 scaffold — backend, frontend, pipeline skeleton 2026-04-10 12:31:43 -04:00
Dockerfile feat: complete Phase 1-2 scaffold — backend, frontend, pipeline skeleton 2026-04-10 12:31:43 -04:00
requirements.txt feat: complete Phase 1-2 scaffold — backend, frontend, pipeline skeleton 2026-04-10 12:31:43 -04:00