Old logic used output text length as a proxy for prompt tokens — completely wrong. Real Gemini calls send the full conversation history as context, so prompt grows with every turn. New logic: - completion_tokens = len(response_text) / 3.8 (what was generated) - prompt_tokens = base_template + sum(all_prior_messages_in_fg) / 3.8 - persona_response base: 1500 tok (template + persona details + topic) - moderator base: 1200 tok (moderator template + fg context) - persona_generate base: 2500 tok (persona-detailed-generation.md template) Also: - Sorts messages chronologically per focus group before processing - Accumulates context correctly so turn N includes turns 0..N-1 as context - Idempotency via pre-fetched set instead of per-doc find_one queries - cost_usd breakdown now has correct input/output split (not 40/60 guess) - Dry-run prints per-focus-group cost estimates for sanity checking Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| backfill_usage.py | ||
| generate_architecture_doc.py | ||
| populate_db.py | ||
| populate_db_direct.py | ||
| seed_model_pricing.py | ||
| setup_mongodb.sh | ||