amazon-transcreation/backend
DJP 100eddbc21 Switch LLM calls to streaming + tighten batch sizes
The Anthropic SDK refuses non-streaming calls expected to take >10
minutes ("Streaming is required..."). Long-output batches (32k tokens
of densely-formatted markdown) hit this on real 172-line briefs.

Both LLMClient.create_message and create_message_cached now use the
streaming context manager (client.messages.stream(...)) and accumulate
text chunks; final usage + stop_reason come from get_final_message().
No timeout on streaming requests.

Tightened the batch tier so individual streams stay well under any
ceiling and progress / failure recovery is more granular:

- ≤50 lines: single call
- 51-120: batches of 30 (max_tokens=16k each)
- 121+:   batches of 25 (max_tokens=16k each)

Verified with the 172-line case: 7 batches of 25, 172 drafts produced.
Live streaming call confirmed end-to-end (haiku returned, usage and
stop_reason populated correctly).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 12:20:16 -04:00
..
alembic Implement user management: viewer role, real API wiring, admin sidebar 2026-04-15 18:37:16 +01:00
app Switch LLM calls to streaming + tighten batch sizes 2026-05-06 12:20:16 -04:00
tests feat: complete Phase 1-2 scaffold — backend, frontend, pipeline skeleton 2026-04-10 12:31:43 -04:00
alembic.ini feat: complete Phase 1-2 scaffold — backend, frontend, pipeline skeleton 2026-04-10 12:31:43 -04:00
Dockerfile feat: complete Phase 1-2 scaffold — backend, frontend, pipeline skeleton 2026-04-10 12:31:43 -04:00
requirements.txt feat: complete Phase 1-2 scaffold — backend, frontend, pipeline skeleton 2026-04-10 12:31:43 -04:00