The Anthropic SDK refuses non-streaming calls expected to take >10
minutes ("Streaming is required..."). Long-output batches (32k tokens
of densely-formatted markdown) hit this on real 172-line briefs.
Both LLMClient.create_message and create_message_cached now use the
streaming context manager (client.messages.stream(...)) and accumulate
text chunks; final usage + stop_reason come from get_final_message().
No timeout on streaming requests.
Tightened the batch tier so individual streams stay well under any
ceiling and progress / failure recovery is more granular:
- ≤50 lines: single call
- 51-120: batches of 30 (max_tokens=16k each)
- 121+: batches of 25 (max_tokens=16k each)
Verified with the 172-line case: 7 batches of 25, 172 drafts produced.
Live streaming call confirmed end-to-end (haiku returned, usage and
stop_reason populated correctly).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| alembic | ||
| app | ||
| tests | ||
| alembic.ini | ||
| Dockerfile | ||
| requirements.txt | ||