The Anthropic SDK refuses non-streaming calls expected to take >10
minutes ("Streaming is required..."). Long-output batches (32k tokens
of densely-formatted markdown) hit this on real 172-line briefs.
Both LLMClient.create_message and create_message_cached now use the
streaming context manager (client.messages.stream(...)) and accumulate
text chunks; final usage + stop_reason come from get_final_message().
No timeout on streaming requests.
Tightened the batch tier so individual streams stay well under any
ceiling and progress / failure recovery is more granular:
- ≤50 lines: single call
- 51-120: batches of 30 (max_tokens=16k each)
- 121+: batches of 25 (max_tokens=16k each)
Verified with the 172-line case: 7 batches of 25, 172 drafts produced.
Live streaming call confirmed end-to-end (haiku returned, usage and
stop_reason populated correctly).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously briefs above ~150 source lines hit the Sonnet 4.6 64k output
cap and were silently truncated. Now we batch:
- ≤70 lines: one LLM call (no change)
- 71-150: batches of 50 (2-3 calls)
- 151+: batches of 40 (unbounded)
Each batch uses Anthropic prompt caching: the V25 system prompt + job
parameters + TM entries + reference data + supplementary files form a
cached prefix; only the per-batch source lines vary. After the first
batch, subsequent batches read the prefix from cache at ~10% input cost,
so an N-batch job costs roughly (1 + 0.1*(N-1)) full prompts instead
of N.
Implementation:
- New LLMClient.create_message_cached / acreate_message_cached methods
that mark system_prompt and cached_user_content with cache_control:
ephemeral. Tracks cache_creation_input_tokens and
cache_read_input_tokens in usage and applies the right cost rates
(1.25x for writes, 0.1x for reads).
- AgentSingle.run() refactored to build the cached static prefix once,
then loop over batches sending only the per-batch source lines as the
dynamic content. Each batch's parsed rows are appended to
context.draft_outputs / ranking_declarations.
- Per-batch instructions added to the prompt for batched runs ("This is
batch N of M ... output a table for these lines only ... do not
repeat prior batches"). Single-call runs (≤70 lines) skip this note.
- Linguistic summary: kept from the last batch (batched mode) or the
single batch (single mode).
- Per-batch logging of input_tokens / cache_read / cache_creation /
output_tokens / stop_reason for visibility.
Verified end-to-end: N=10/70/100/150/250 produce 1/1/2/3/7 LLM calls
with correct draft counts, and live caching reads the cached prefix on
the second call within the 5-minute TTL.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TM upload-replacement bug (critical):
- Uploads were writing to /storage/clients/<uuid>/tm/... but the pipeline
reads from /storage/amazon/tm/... — replacements were silently ignored
- upload_tm_file now writes to the canonical pipeline path
/storage/amazon/tm/<locale>/flat_<channel>_<lc>.json (overwrites in place)
- Filename casing is preserved when an existing file is being replaced
(the on-disk seeded files use mixed casing: flat_MASS, flat_value,
flat_PrimeSpeed); falls back to CHANNEL_FILE_MAP, then user-typed case
- Registry upsert by (client_id, locale_code, channel): replaces row in
place rather than inserting duplicates
- Verified: replacement file at canonical path, registry COUNT=1, no dupes
Supplementary files now reach the LLM (critical):
- New supplementary_files field on FileManifest
- _resolve_file_manifest scans /storage/jobs/<job_id>/supplementary/ and
populates the manifest, with per-locale gating by filename prefix
(e.g. de-DE_glossary.txt only goes to de-DE; global_brief.txt goes to all)
- _format_supplementary_for_prompt reads each file (.txt/.md/.json/.csv/.tsv
/.docx) and inlines its text into the LLM user message under a
"## SUPPLEMENTARY MATERIAL" header, capped at 40k chars per file
- .docx files are extracted via inline zipfile read (no new dependency)
New job wizard:
- Per-supplementary-file locale dropdown ("Global" or one of 12 locales)
- Filename gets prefixed with the locale on upload (de-DE_brief.docx)
Admin TM upload:
- Channel field is now a free-text input with autocomplete suggestions
(datalist of known channels) — lets users add brand-new channels like
PrimeCBM that didn't exist before
Pipeline scaling:
- Bumped dynamic max_tokens tiers: 80+ lines now gets 64k output budget
(was 32k); 132-line briefs no longer truncate. Sonnet 4.6 caps at 64k
- Added stop_reason logging — "max_tokens" stop now shows up in logs
loud and clear rather than silently truncating
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The V25 table has duplicate column names (Backtranslation x3, Rationale x3).
The dict-based parser collapsed these — only the last value survived (Option 3's
"N/A"), causing all BT/rationale fields to be "N/A" in the output Excel.
Fixed by switching to positional list-based parsing instead of dicts.
Also adds per-job model selection (Sonnet 4.6 / Opus 4.6) through the full
stack: DB column, API schema, job wizard UI dropdown, pipeline contracts, and
LLM client with model-aware cost tracking. Includes Alembic migration.
Updated help page and README to reflect single-agent pipeline, multi-TM
selection, flat locale grid, model selector, and linguistic summary.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>