The agent reported (for an nl-BE job) that glossary and blacklist were
"not provided" and date/percentage formats were "provided but empty".
The files are on disk with real content — the bug was in the loaders,
which expected shapes that didn't match what's actually shipped:
- load_glossary expected a top-level JSON list, but files use
{"locale": "...", "entries": [...]}. RefFileLoadError raised,
silently caught by load_all_reference_files, result became None.
- load_blacklist had the same mismatch, same outcome.
- load_date_pct_formats accepted the dict shape but only knew about
the "date_formats"/"percentage_formats" keys; the files use
"entries" → returned {"date_formats": [], "percentage_formats": []}
which is exactly what the agent reported.
Fix:
- New _extract_entries() helper that accepts both the wrapper shape
{entries: [...]} and a bare list. load_glossary / load_blacklist
both delegate to it.
- load_date_pct_formats now passes entries through alongside the
legacy date_formats / percentage_formats keys (back-compat).
- load_all_reference_files now logs a warning when a loader raises
RefFileLoadError instead of silently swallowing it — so any future
loader/file-shape drift surfaces in the celery logs.
Verified inside the backend container against nl-BE, de-DE, fr-FR:
- 58 / 68 / 64 glossary entries respectively (was 0)
- 14 / 9 / 4 blacklist entries (was 0)
- 10 / 10 / 10 date/pct entries (was empty)
- locale_considerations and tov_global still load correctly
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug 1: Empty tm_channels was silently re-defaulted to [campaign channel]
in both agent_single.py and job_tasks.py via `or [channel]`. Python's
`or` treats [] as falsy, so the frontend's empty-list intent was lost.
Fixed by replacing `or` with an explicit `is not None` check at both
sites. Empty list now means "load no TMs"; None still falls back.
Bug 2: Supplementary files dropped by Agent1Validator. The validator
built FileManifest(...) with explicit kwargs but forgot
supplementary_files, so the raw field from _resolve_file_manifest
never reached agent_single.run(). Files were uploaded to disk but
never inlined into the LLM context. Fixed by adding
supplementary_files=raw.get("supplementary_files", []) to the
validator's FileManifest construction.
Bug 3: New TM channels lowercased in StepReview.tsx, breaking
case-sensitive file lookup. On Linux, "flat_primecbmt_nl-be.json"
≠ "flat_PrimeCBMT_nl-be.json", so the file was silently skipped and
zero TM entries loaded. Legacy channels worked only because the
hardcoded CHANNEL_FILE_MAP has lowercase keys mapping to
canonically-cased filenames; auto-discovered channels (PrimeCBM,
PrimeCBMT, etc.) had no such safety net. Two-part fix:
3a. StepReview.tsx no longer lowercases tm_channels — preserves case
end-to-end from registry → frontend → backend → disk.
3b. _resolve_all_tm_paths builds a case-insensitive index of the
locale's TM directory once per call and resolves filenames
against it. Forgives any historical case-drift between registry
and disk.
Verified end-to-end with a standalone test script run inside the
backend container: 8/8 assertions pass covering empty tm_channels,
supplementary file passthrough, exact-case lookups, lowercase
fallback, missing channels, legacy MASS in both cases, and empty
tm_channels yielding no TM paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Anthropic SDK refuses non-streaming calls expected to take >10
minutes ("Streaming is required..."). Long-output batches (32k tokens
of densely-formatted markdown) hit this on real 172-line briefs.
Both LLMClient.create_message and create_message_cached now use the
streaming context manager (client.messages.stream(...)) and accumulate
text chunks; final usage + stop_reason come from get_final_message().
No timeout on streaming requests.
Tightened the batch tier so individual streams stay well under any
ceiling and progress / failure recovery is more granular:
- ≤50 lines: single call
- 51-120: batches of 30 (max_tokens=16k each)
- 121+: batches of 25 (max_tokens=16k each)
Verified with the 172-line case: 7 batches of 25, 172 drafts produced.
Live streaming call confirmed end-to-end (haiku returned, usage and
stop_reason populated correctly).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously briefs above ~150 source lines hit the Sonnet 4.6 64k output
cap and were silently truncated. Now we batch:
- ≤70 lines: one LLM call (no change)
- 71-150: batches of 50 (2-3 calls)
- 151+: batches of 40 (unbounded)
Each batch uses Anthropic prompt caching: the V25 system prompt + job
parameters + TM entries + reference data + supplementary files form a
cached prefix; only the per-batch source lines vary. After the first
batch, subsequent batches read the prefix from cache at ~10% input cost,
so an N-batch job costs roughly (1 + 0.1*(N-1)) full prompts instead
of N.
Implementation:
- New LLMClient.create_message_cached / acreate_message_cached methods
that mark system_prompt and cached_user_content with cache_control:
ephemeral. Tracks cache_creation_input_tokens and
cache_read_input_tokens in usage and applies the right cost rates
(1.25x for writes, 0.1x for reads).
- AgentSingle.run() refactored to build the cached static prefix once,
then loop over batches sending only the per-batch source lines as the
dynamic content. Each batch's parsed rows are appended to
context.draft_outputs / ranking_declarations.
- Per-batch instructions added to the prompt for batched runs ("This is
batch N of M ... output a table for these lines only ... do not
repeat prior batches"). Single-call runs (≤70 lines) skip this note.
- Linguistic summary: kept from the last batch (batched mode) or the
single batch (single mode).
- Per-batch logging of input_tokens / cache_read / cache_creation /
output_tokens / stop_reason for visibility.
Verified end-to-end: N=10/70/100/150/250 produce 1/1/2/3/7 LLM calls
with correct draft counts, and live caching reads the cached prefix on
the second call within the 5-minute TTL.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TM upload-replacement bug (critical):
- Uploads were writing to /storage/clients/<uuid>/tm/... but the pipeline
reads from /storage/amazon/tm/... — replacements were silently ignored
- upload_tm_file now writes to the canonical pipeline path
/storage/amazon/tm/<locale>/flat_<channel>_<lc>.json (overwrites in place)
- Filename casing is preserved when an existing file is being replaced
(the on-disk seeded files use mixed casing: flat_MASS, flat_value,
flat_PrimeSpeed); falls back to CHANNEL_FILE_MAP, then user-typed case
- Registry upsert by (client_id, locale_code, channel): replaces row in
place rather than inserting duplicates
- Verified: replacement file at canonical path, registry COUNT=1, no dupes
Supplementary files now reach the LLM (critical):
- New supplementary_files field on FileManifest
- _resolve_file_manifest scans /storage/jobs/<job_id>/supplementary/ and
populates the manifest, with per-locale gating by filename prefix
(e.g. de-DE_glossary.txt only goes to de-DE; global_brief.txt goes to all)
- _format_supplementary_for_prompt reads each file (.txt/.md/.json/.csv/.tsv
/.docx) and inlines its text into the LLM user message under a
"## SUPPLEMENTARY MATERIAL" header, capped at 40k chars per file
- .docx files are extracted via inline zipfile read (no new dependency)
New job wizard:
- Per-supplementary-file locale dropdown ("Global" or one of 12 locales)
- Filename gets prefixed with the locale on upload (de-DE_brief.docx)
Admin TM upload:
- Channel field is now a free-text input with autocomplete suggestions
(datalist of known channels) — lets users add brand-new channels like
PrimeCBM that didn't exist before
Pipeline scaling:
- Bumped dynamic max_tokens tiers: 80+ lines now gets 64k output budget
(was 32k); 132-line briefs no longer truncate. Sonnet 4.6 caps at 64k
- Added stop_reason logging — "max_tokens" stop now shows up in logs
loud and clear rather than silently truncating
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A1 Export columns shifted (critical):
- V25 LLM occasionally emits 12/13-col tables with Copy Type/Char Limit prefix
- Parser now anchors on "Option 1" header position; robust to any prefix shift
- Verified with 23/23 unit tests covering 11/12/13-col variants
- Source-line block in prompt no longer uses pipe separators (defence in depth)
A2 Linguistic summary fallback:
- Drop the metadata key/value table fallback on Tab 2
- Show "No linguistic summary was generated" when the agent didn't produce one
A3 Dashboard stuck on "Running":
- useJobs / useJob now poll every 5s while any job/locale is in an active state
- Stops polling once everything is COMPLETED or ERROR
B1 TM auto-config: respect empty selection
- Send no TM files when user unchecks all (was auto-adding campaign channel)
- Backend distinguishes empty list vs missing field
B2 Auto-discover channels from TM registry:
- New GET /api/v1/files/tm/channels endpoint reads distinct channels from registry
- Frontend StepConfigure fetches channels per client; falls back to static list
- Pipeline TM resolution falls back to flat_<Channel>_<lc>.json pattern for any
registered channel (no hardcoded map needed for new channels like PrimeCBM)
B3 Job inputs visible on monitoring:
- New "Inputs sent to the agent" card on /jobs/[id] showing AI model, TM files,
supplementary file list, and context override
- New GET /api/v1/jobs/{id}/supplementary endpoint listing on-disk supplementary files
C1 Context cap (large briefs truncating):
- max_tokens scales with source line count (8k/16k/32k/64k by tier)
- 172-line briefs now have ~64k output budget instead of fixed 16k
D1 Reviewer comments in xlsx export:
- Export endpoint now copies xlsx to temp path on download, queries Feedback
joined with User, and appends "Reviewer (Name): comment" to the rationale
cells of options that have feedback
- Original generated file remains untouched
D2 Hide Clients & Voice from sidebar (page still reachable by URL)
D3 Remove dead notifications + settings icons from header
D4 Cost by Locale table added to Analytics with total + avg cost per brief
Makefile seed target now also runs register_storage_files so TM registry is
populated from disk on first setup (deploy.sh already does this via --init).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add `await db.refresh(user)` after `db.flush()` in create_user and
update_user so server-generated `updated_at` is available before
model_validate (async SQLAlchemy cannot lazy-load expired attributes)
- Add DialogDescription to satisfy Radix UI aria requirement
- Wrap form fields in <form> to resolve browser password-not-in-form warning
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add viewer role to backend enum + Alembic migration
- SSO auto-provisioned users now get viewer (lowest privilege) by default
- Wire admin/users page to real API (replace mock data), with add/edit/deactivate
- Fix frontend UserRole enum to match backend (TM_MANAGER, REVIEWER)
- Replace hardcoded mock user in Sidebar with real auth, filter admin-only nav items, wire logout
- Add seed script to set default admins (daveporter, vadymsamoilenko)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Backend: Azure AD JWKS validator with 24h cache, new POST /api/v1/auth/sso/login
endpoint, sso_login() in AuthService with auto-provisioning, password_hash made
nullable, auth_provider column added, Alembic migration c1d2e3f4a5b6
- Frontend: @azure/msal-browser, msal.ts config singleton, ssoLogin() API function,
login page updated with SSO button and redirect callback handling
- Deploy: frontend Dockerfile and docker-compose.prod.yml updated to bake Azure AD
vars into the image at build time; deploy.sh validates SSO config on init/deploy
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The V25 table has duplicate column names (Backtranslation x3, Rationale x3).
The dict-based parser collapsed these — only the last value survived (Option 3's
"N/A"), causing all BT/rationale fields to be "N/A" in the output Excel.
Fixed by switching to positional list-based parsing instead of dicts.
Also adds per-job model selection (Sonnet 4.6 / Opus 4.6) through the full
stack: DB column, API schema, job wizard UI dropdown, pipeline contracts, and
LLM client with model-aware cost tracking. Includes Alembic migration.
Updated help page and README to reflect single-agent pipeline, multi-TM
selection, flat locale grid, model selector, and linguistic summary.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Four changes from user testing feedback:
1. Merge MAIN/DERIVED locale selectors into single 12-locale grid, auto-classify locale_type
2. Add multi-TM channel selection (checkbox grid, tm_channels JSON column, multi-file resolution)
3. Replace 6-agent pipeline with single V25-based agent (feature-flagged via USE_SINGLE_AGENT)
4. Replace Excel Tab 2 metadata with linguistic summary from agent output
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Backend: Added confidence_high/moderate/low/total_output_rows to
JobListResponse, computed via a batch query joining output_rows
- Frontend JobCard: Shows a stacked progress bar with green/amber/red
segments and counts for High/Moderate/Low confidence tiers
- Frontend StepConfigure: Auto-selects Amazon as default client when
creating a new job (falls back to first client if Amazon not found)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The pipeline stores tm_entries_cited as a list[str] of seg_keys, but the
Pydantic response schema expected dict[str, Any], causing a validation
error when loading the output preview page.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Only value/mass/onsite/outbound were mapped, so jobs with channel=UEFA
got "Unknown channel" and fell back to no TM matches, causing all LOW
confidence scores.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The rerun endpoint returned 500 because Pydantic tried to serialize
updated_at from a stale SQLAlchemy instance after flush(). Added
db.refresh(instance) to ensure all attributes are loaded.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The compact TM format parser was storing the combined EN+TX text in both
fields, causing the LLM retrieval agent to fail at matching source lines
against TM entries — resulting in all-low confidence tiers. Added
_split_en_tx() heuristic that detects the language boundary at the first
non-ASCII sentence. Also includes raw _text in LLM prompt for context.
Fixed get_jobs_over_time GroupingError by using literal_column for
date_trunc, added date filters to status_breakdown, and fixed Decimal
serialization in locale stats.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix sidebar nav so Dashboard/Monitoring and Audit Trail/System Logs
highlight independently by using useSearchParams to distinguish
query-param-based routes. Fix get_jobs_over_time SQL GroupingError
by using literal_column for date_trunc interval. Add date filters to
status_breakdown query and fix Decimal serialization in locale stats.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace mock chart data on reports page with real backend queries (jobs over
time, locale stats, usage stats, quality metrics). Add audit logging to auth
(login/login_failed), file management (upload/delete TM and reference files),
and feedback submission so the system logs page shows complete activity.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
StepUpload was showing hardcoded "42 Total Lines, 8 Display Formats"
for every file upload. Now:
- Added POST /jobs/validate-source endpoint that parses xlsx in a
temp file and returns real stats (line count, display formats,
columns found, warnings) without creating any DB records
- Frontend calls validateSource() when user selects a file
- Shows spinner during validation, real results after
- Blocks "Next" if validation fails
- Removed all mock validation data
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix API path: frontend now calls /audit/logs (was /audit)
- Backend eagerly loads User relationship for audit entries
- Backend response includes user_name field instead of just user_id
- Frontend logs page fetches real data with pagination
- Derive INFO/WARN/ERROR levels from action type
- Format details JSON into readable descriptions
- Add loading state and empty state handling
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Feedback was saving to DB but never loaded back on page revisit.
Three-point fix:
- Backend schema: add feedback list to OutputRowResponse
- Backend service: eagerly load feedback relationship in preview query
- Frontend mapper: map latest feedback entry to OutputRow.feedback
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Wire token usage from LLM agents through pipeline context to DB and frontend
- Agents 2 and 4 accumulate input/output tokens and cost into PipelineContext
- job_tasks.py saves token totals to locale instance after pipeline completion
- Monitoring cards show total tokens and estimated cost instead of broken 0/0
- Make feedback highlighting bolder: colored card borders, stronger button states
- Add estimated cost display to dashboard job cards
- Add Help page with full documentation and link in sidebar navigation
- Comprehensive README with ASCII architecture diagrams
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace all stub agents with working Claude API-powered agents:
- Agent 2 (TM Retrieval): LLM semantic matching of source lines against TM entries
- Agent 3 (Ranker): Deterministic ranking with confidence tiers (high/moderate/low)
- Agent 4 (Transcreator): Batched creative transcreation with voice profiles, reference files, backtranslations
- Agent 5 (Compliance): Deterministic checks for character limits, blacklist terms, domain substitution
Also fixes TM file loader to handle real compact JSONL format (locale code regex-based parsing),
and adds file manifest resolution for reference files (glossary, blacklist, TOV, locale considerations).
Verified end-to-end: 53-line de-DE brief produces real German translations with TM matching,
confidence-based option counts (1/2/3), backtranslations, and compliance validation. ~$0.49 total cost.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Job wizard now calls real API: create job → upload source → launch
- Dashboard and monitoring pages use live data instead of mock data
- Monitoring page polls every 3s while job is active
- Backend enriches job responses with client_name, created_by_name,
source_line_count from eager-loaded relationships
- Frontend response mappers handle backend→frontend type differences
(lowercase enum values, field name mapping, computed progress/stage)
- Source file parser accepts column aliases (Line type, Context notes)
with case-insensitive matching for real-world Excel files
- Clients list endpoint accessible to all authenticated users
- Fixed uploadSource to use PUT, uploadSupplementary per-file
- Removed all hardcoded mock data from useJobs hook
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- deploy.sh: one-command deploy script (--init for first time, bare for updates)
- docker-compose.prod.yml: production stack with nginx, multi-worker uvicorn, no volume mounts for code
- nginx/nginx.conf: reverse proxy with rate limiting, WebSocket support, static asset caching
- Fix login to use real backend API instead of mock localStorage tokens
- Add auth guard to AppShell (prevents flash-of-content on unauthenticated routes)
- JWT claims decoded client-side for user info (no extra /me call needed)
- Switch logo from missing .jpeg to .svg
- Frontend API URL defaults to same-origin (works behind nginx without CORS)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>