Commit graph

5 commits

Author SHA1 Message Date
DJP
100eddbc21 Switch LLM calls to streaming + tighten batch sizes
The Anthropic SDK refuses non-streaming calls expected to take >10
minutes ("Streaming is required..."). Long-output batches (32k tokens
of densely-formatted markdown) hit this on real 172-line briefs.

Both LLMClient.create_message and create_message_cached now use the
streaming context manager (client.messages.stream(...)) and accumulate
text chunks; final usage + stop_reason come from get_final_message().
No timeout on streaming requests.

Tightened the batch tier so individual streams stay well under any
ceiling and progress / failure recovery is more granular:

- ≤50 lines: single call
- 51-120: batches of 30 (max_tokens=16k each)
- 121+:   batches of 25 (max_tokens=16k each)

Verified with the 172-line case: 7 batches of 25, 172 drafts produced.
Live streaming call confirmed end-to-end (haiku returned, usage and
stop_reason populated correctly).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 12:20:16 -04:00
DJP
70cade819c Source-line batching with prompt caching for arbitrarily large briefs
Previously briefs above ~150 source lines hit the Sonnet 4.6 64k output
cap and were silently truncated. Now we batch:

- ≤70 lines:  one LLM call (no change)
- 71-150:     batches of 50 (2-3 calls)
- 151+:       batches of 40 (unbounded)

Each batch uses Anthropic prompt caching: the V25 system prompt + job
parameters + TM entries + reference data + supplementary files form a
cached prefix; only the per-batch source lines vary. After the first
batch, subsequent batches read the prefix from cache at ~10% input cost,
so an N-batch job costs roughly (1 + 0.1*(N-1)) full prompts instead
of N.

Implementation:
- New LLMClient.create_message_cached / acreate_message_cached methods
  that mark system_prompt and cached_user_content with cache_control:
  ephemeral. Tracks cache_creation_input_tokens and
  cache_read_input_tokens in usage and applies the right cost rates
  (1.25x for writes, 0.1x for reads).
- AgentSingle.run() refactored to build the cached static prefix once,
  then loop over batches sending only the per-batch source lines as the
  dynamic content. Each batch's parsed rows are appended to
  context.draft_outputs / ranking_declarations.
- Per-batch instructions added to the prompt for batched runs ("This is
  batch N of M ... output a table for these lines only ... do not
  repeat prior batches"). Single-call runs (≤70 lines) skip this note.
- Linguistic summary: kept from the last batch (batched mode) or the
  single batch (single mode).
- Per-batch logging of input_tokens / cache_read / cache_creation /
  output_tokens / stop_reason for visibility.

Verified end-to-end: N=10/70/100/150/250 produce 1/1/2/3/7 LLM calls
with correct draft counts, and live caching reads the cached prefix on
the second call within the 5-minute TTL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 15:02:48 -04:00
DJP
d3f6a57386 Round 2.5 feedback: TM replacements take effect, supplementary files reach LLM, larger briefs fit, free-text channel uploads
TM upload-replacement bug (critical):
- Uploads were writing to /storage/clients/<uuid>/tm/... but the pipeline
  reads from /storage/amazon/tm/... — replacements were silently ignored
- upload_tm_file now writes to the canonical pipeline path
  /storage/amazon/tm/<locale>/flat_<channel>_<lc>.json (overwrites in place)
- Filename casing is preserved when an existing file is being replaced
  (the on-disk seeded files use mixed casing: flat_MASS, flat_value,
  flat_PrimeSpeed); falls back to CHANNEL_FILE_MAP, then user-typed case
- Registry upsert by (client_id, locale_code, channel): replaces row in
  place rather than inserting duplicates
- Verified: replacement file at canonical path, registry COUNT=1, no dupes

Supplementary files now reach the LLM (critical):
- New supplementary_files field on FileManifest
- _resolve_file_manifest scans /storage/jobs/<job_id>/supplementary/ and
  populates the manifest, with per-locale gating by filename prefix
  (e.g. de-DE_glossary.txt only goes to de-DE; global_brief.txt goes to all)
- _format_supplementary_for_prompt reads each file (.txt/.md/.json/.csv/.tsv
  /.docx) and inlines its text into the LLM user message under a
  "## SUPPLEMENTARY MATERIAL" header, capped at 40k chars per file
- .docx files are extracted via inline zipfile read (no new dependency)

New job wizard:
- Per-supplementary-file locale dropdown ("Global" or one of 12 locales)
- Filename gets prefixed with the locale on upload (de-DE_brief.docx)

Admin TM upload:
- Channel field is now a free-text input with autocomplete suggestions
  (datalist of known channels) — lets users add brand-new channels like
  PrimeCBM that didn't exist before

Pipeline scaling:
- Bumped dynamic max_tokens tiers: 80+ lines now gets 64k output budget
  (was 32k); 132-line briefs no longer truncate. Sonnet 4.6 caps at 64k
- Added stop_reason logging — "max_tokens" stop now shows up in logs
  loud and clear rather than silently truncating

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 14:28:20 -04:00
DJP
d5fa4e49f7 Fix markdown table parser losing backtranslations/rationales, add model selection, update help page
The V25 table has duplicate column names (Backtranslation x3, Rationale x3).
The dict-based parser collapsed these — only the last value survived (Option 3's
"N/A"), causing all BT/rationale fields to be "N/A" in the output Excel.

Fixed by switching to positional list-based parsing instead of dicts.

Also adds per-job model selection (Sonnet 4.6 / Opus 4.6) through the full
stack: DB column, API schema, job wizard UI dropdown, pipeline contracts, and
LLM client with model-aware cost tracking. Includes Alembic migration.

Updated help page and README to reflect single-agent pipeline, multi-TM
selection, flat locale grid, model selector, and linguistic summary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 12:40:17 -04:00
DJP
98fa16bfc3 feat: complete Phase 1-2 scaffold — backend, frontend, pipeline skeleton
Full-stack Amazon AI Transcreation Platform with:
- FastAPI backend (async, PostgreSQL, Redis, Celery) with 11 DB tables
- JWT auth (SSO-ready abstract provider pattern)
- 6-agent pipeline orchestrator with deterministic modules
- Next.js 14 frontend with Amazon branding (Ember fonts, orange/dark theme)
- Job wizard, monitoring HUD, output review, admin screens
- 154 TM/reference files imported, 12 locales configured
- Docker Compose for all services

Agents 2-5 (TM retrieval, ranker, transcreator, compliance) are stubs
pending Phase 3 LLM integration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 12:31:43 -04:00